Stalker

Apr 22, 2026 00:00:00

Frida Stalker — Technical Reference

Source: frida-gum/gum/backend-arm64/gumstalker-arm64.c Supported architectures: AArch64 (ARM64), Intel 64 (x86-64), IA-32 Primary use platforms: Android, iOS (AArch64); Linux, macOS, Windows (x86-64/IA-32)

Architecture Overview

Stalker operates on one basic block at a time:

A block starting at real_address is read by GumArm64Relocator
An instrumented copy is written to a slab by GumArm64Writer
Branch/return instructions at the end of the block are virtualized — replaced with code that re-enters Stalker via an entry gate
The entry gate calls gum_exec_ctx_replace_current_block_with() to instrument the next block
Previously instrumented blocks are cached in a hashtable keyed on real_address; on cache hit, control jumps directly to the instrumented copy (subject to trustThreshold)

Prerequisite knowledge: Capstone disassembler (cs_insn), GumArm64Writer/GumArm64Relocator APIs, AArch64 calling conventions (AAPCS64), AArch64 Link Register (X30/LR) semantics.

Entry Points

Function	Description
`gum_stalker_follow_me(self, transformer, sink)`	Follow the current thread; uses LR to find the start address
`gum_stalker_follow(self, thread_id, transformer, sink)`	Follow another thread; uses `ptrace`/`gum_process_modify_thread()` to inject
`gum_exec_ctx_replace_current_block_with(ctx, start_address)`	Re-enter Stalker to instrument the next block (called by entry gates)

`gum_stalker_follow_me` — Assembly Bootstrap (AArch64)

1gum_stalker_follow_me:
2  stp x29, x30, [sp, -16]!   ; save FP and LR
3  mov x29, sp
4  mov x3, x30                 ; pass original LR as 4th arg (return address)
5  bl  _gum_stalker_do_follow_me
6  ldp x29, x30, [sp], 16
7  br  x0                      ; branch to instrumented entry point returned in X0

_gum_stalker_do_follow_me(self, transformer, sink, ret_addr) — initializes GumExecCtx, instruments first block, returns its code_address
AArch64 args in X0–X7; return value in X0

Following Another Thread (`gum_stalker_follow`)

1void gum_stalker_follow(GumStalker *self, GumThreadId thread_id,
2                        GumStalkerTransformer *transformer, GumEventSink *sink);

If thread_id == current_thread_id, delegates to gum_stalker_follow_me()
Otherwise calls gum_process_modify_thread(thread_id, gum_stalker_infect, &ctx)
Linux/Android: uses ptrace via a cloned helper process in its own process group (workaround: cannot ptrace own process group); communicates via UNIX socket; respects PR_SET_DUMPABLE/PR_SET_PTRACER
gum_stalker_infect() writes code into the target thread’s context using GumArm64Writer instead of calling functions directly

CPU Context Structure (AArch64)

 1typedef GumArm64CpuContext GumCpuContext;
 2
 3struct _GumArm64CpuContext {
 4  guint64 pc;
 5  guint64 sp;       /* X31 */
 6  guint64 x[29];
 7  guint64 fp;       /* X29 — frame pointer */
 8  guint64 lr;       /* X30 */
 9  guint8  q[128];   /* FPU/NEON/CRYPTO (SIMD) registers */
10};

JavaScript API

1Stalker.follow([threadId, options])   // start stalking threadId (or current thread)
2Stalker.unfollow([threadId])
3Stalker.exclude(range)                // { base, size } — exclude a memory range
4Stalker.parse(events)                 // parse raw binary event buffer → JS array of tuples
5Stalker.addCallProbe(address, callback[, data])  // add probe for a function address
6Stalker.removeCallProbe(id)
7Stalker.trustThreshold                // integer property (default: 1)

Options Object

 1{
 2  events: {
 3    call: false,       // emit GUM_CALL event on each call instruction
 4    ret: false,        // emit GUM_RET event on each return
 5    exec: false,       // emit GUM_EXEC event on each instruction
 6    block: false,      // emit GUM_BLOCK event when a block is executed
 7    compile: false,    // emit GUM_COMPILE event when a block is instrumented
 8  },
 9  onReceive(events) {},       // raw binary blob, parse with Stalker.parse()
10  onCallSummary(summary) {},  // aggregated { address: callCount } map (more efficient)
11  transform(iterator) {},     // custom transformer (see Transformer section)
12  data: ptr("0x...")          // user data passed to C transformer/callout
13}

Configuration Options

`trustThreshold`

Controls how many times a block must execute unchanged before it is re-used without re-comparison.

Value	Behavior
`-1`	Never trust; always re-instrument (slowest)
`0`	Trust immediately from first execution
`N` (default: 1)	Trust after N consecutive executions with identical bytes

Even at -1, the original code snapshot is still stored in the slab (retained for simplicity)
When set to -1: slab offset is reset to 0 on new block allocation, overwriting all previous instrumented blocks in the slab

`Stalker.exclude(range)`

1Stalker.exclude({ base: Module.getBaseAddress('libc.so'), size: module.size });

Prevents instrumentation of code in the given range
When a call enters an excluded range: stalking stops until the call returns; callbacks into non-excluded ranges from within the excluded range are not captured
Tracked via pending_calls counter; unfollow is deferred while pending_calls > 0

Memory: Slabs

Instrumented code is stored in 4 MB slabs (GUM_CODE_SLAB_MAX_SIZE = 4 * 1024 * 1024).

1struct _GumSlab {
2  guint8    *data;         // tail start (after header)
3  guint      offset;       // current write position in tail
4  guint      size;         // usable tail size
5  GumSlab   *next;         // singly-linked list of slabs
6  guint      num_blocks;
7  GumExecBlock blocks[];   // zero-length array; actual entries in header region
8};

Header region: slab_size / 12 bytes (page-aligned) — stores GumSlab + GumExecBlock[]
Tail region: remaining ~3.67 MB — stores instrumented instructions inline
Allocated with GUM_PAGE_RWX if supported; otherwise GUM_PAGE_RW and toggled via freeze/thaw
Slabs are chained via next pointer for disposal

Block Allocation

1#define GUM_EXEC_BLOCK_MIN_SIZE 1024  // minimum bytes required before allocating new slab

New block only allocated if >= 1024 bytes remain in tail AND num_blocks < slab_max_blocks
If trust_threshold < 0: reset slab->offset = 0 (overwrite mode) instead of allocating new slab
On new slab: calls gum_exec_ctx_ensure_inline_helpers_reachable() to write or re-use helpers

`GumExecBlock` Fields

 1struct _GumExecBlock {
 2  GumExecCtx    *ctx;
 3  GumSlab       *slab;
 4  guint8        *real_begin;      // start of original code
 5  guint8        *real_end;        // end of original code
 6  guint8        *real_snapshot;   // copy of original bytes (= code_end; for trust comparison)
 7  guint8        *code_begin;      // start of instrumented copy
 8  guint8        *code_end;        // end of instrumented copy
 9  GumExecBlockFlags flags;
10  gint           recycle_count;   // trust threshold counter
11};

Layout in slab tail per block: [instrumented code][original snapshot][BRK #14 debug marker]

Helpers (Inline Code Fragments)

Six helper functions are written into each slab (or reused from a nearby slab within ±128 MB):

Helper	Function
`last_prolog_minimal`	Save caller-saved registers (minimal context)
`last_epilog_minimal`	Restore caller-saved registers
`last_prolog_full`	Save all registers matching `GumArm64CpuContext` layout
`last_epilog_full`	Restore all registers
`last_stack_push`	Push `GumExecFrame` onto side-stack
`last_stack_pop_and_go`	Pop frame and branch to instrumented return target

Helpers are called with direct BL (±128 MB range). If the new slab is beyond 128 MB from existing helpers, fresh copies are written into the new slab.

Context Save/Restore

Context Types

Type	Registers Saved	When Used
`GUM_PROLOG_MINIMAL`	X0–X18, X29, X30, Q0–Q7, NZCV flags	Default; all code paths that don’t need callout/probe visibility
`GUM_PROLOG_FULL`	All registers matching `GumArm64CpuContext`	Required by `Stalker.addCallProbe()` and `iterator.putCallout()`

Prologue Inline Stub (written at each instrumented block)

1// Written by gum_exec_ctx_write_prolog():
2stp x19, lr, [sp, -(16 + GUM_RED_ZONE_SIZE)]!   // save X19 (scratch) and LR; skip redzone
3bl  <last_prolog_minimal_or_full>

Red zone: 128-byte region below SP that a leaf function may use; prologue advances SP past it before touching the stack.

Epilogue Inline Stub

1// Written by gum_exec_ctx_write_epilog():
2bl  <last_epilog_minimal_or_full>
3ldp x19, x20, [sp, (16 + GUM_RED_ZONE_SIZE)]     // restore X19 and X20 (post-adjust)

X20 is repurposed during prolog/epilog as a pointer to the saved context base
In GUM_PROLOG_FULL, modifications to GumArm64CpuContext by callouts are reflected back — the epilog writes updated X19/X20 from the context back to their stack slots before popping

Reading Registers from Saved Context

1// Emits code (does not read directly) to load source_register → target_register:
2gum_exec_ctx_load_real_register_into(ctx, target_reg, source_reg, gc);

For GUM_PROLOG_FULL: X20 points to GumArm64CpuContext; offsets match struct layout
For GUM_PROLOG_MINIMAL: X20 points to stack frame; X0–X18 at (reg - X0) * 8; X19/X20 at (11*16 + 4*32) + (reg - X19) * 8; X29/X30 at (10*16) + (reg - X29) * 8

Frames (Side-Stack)

1struct _GumExecFrame {
2  gpointer real_address;   // original return address
3  gpointer code_address;   // instrumented landing pad address
4};

One page allocated per GumExecCtx; filled descending from end of page (512 entries max)
ctx->first_frame = last entry (page end - sizeof frame)
ctx->current_frame = most recently pushed frame

`last_stack_push` (pseudo-code)

 1void last_stack_push_helper(gpointer real_address, gpointer code_address) {
 2  GumExecFrame **x16 = &ctx->current_frame;
 3  GumExecFrame  *x17 = *x16;
 4  if ((x17 & (page_size - 1)) != 0) {  // not page-aligned = not exhausted
 5    x17--;
 6    x17->real_address  = real_address;
 7    x17->code_address  = code_address;
 8    *x16 = x17;
 9  }
10  // if exhausted: silently discard (fall back to slow path on return)
11}

`last_stack_pop_and_go` (pseudo-code)

 1// Called by virtualized RET; x16 = return register value
 2void last_stack_pop_and_go_helper(gpointer x16) {
 3  GumExecFrame **x0 = &ctx->current_frame;
 4  GumExecFrame  *x1 = *x0;
 5  gpointer x17 = x1->real_address;
 6  if (x17 == x16) {                         // fast path: expected return
 7    x17 = x1->code_address;                 // go to instrumented landing pad
 8    x1++;
 9    *x0 = x1;                               // pop frame
10    goto x17;
11  } else {                                  // slow path: unexpected return
12    *x0 = ctx->first_frame;                 // clear entire side-stack
13    ctx->return_at = x16;
14    minimal_prolog();
15    gum_exec_ctx_replace_current_block_from_ret(ctx, ctx->return_at);
16    minimal_epilog();
17    goto ctx->resume_at;                    // branch to newly instrumented block
18  }
19}

Transformer

1// Default transformer — passes all instructions through unchanged:
2static void gum_default_stalker_transformer_transform_block(
3    GumStalkerTransformer *transformer,
4    GumStalkerIterator    *iterator,
5    GumStalkerOutput      *output)
6{
7  while (gum_stalker_iterator_next(iterator, NULL))
8    gum_stalker_iterator_keep(iterator);
9}

Custom Transformer (JavaScript)

 1Stalker.follow(threadId, {
 2  transform(iterator) {
 3    let instruction = iterator.next();
 4    do {
 5      if (instruction.mnemonic === 'bl') {
 6        iterator.putCallout(onCall);   // insert a callout before this instruction
 7      }
 8      iterator.keep();                 // emit the instruction as-is
 9    } while ((instruction = iterator.next()) !== null);
10  }
11});
12
13function onCall(context) {
14  // context is CpuContext — read/write registers
15  console.log('call to', context.pc);
16}

Callout Structure

1typedef void (* GumStalkerCallout)(GumCpuContext *cpu_context, gpointer user_data);
2
3struct _GumCalloutEntry {
4  GumStalkerCallout  callout;
5  gpointer           data;
6  GDestroyNotify     data_destroy;
7  gpointer           pc;
8  GumExecCtx        *exec_context;
9};

Callouts require GUM_PROLOG_FULL — the full CPU context is saved so callouts can read and modify all registers

Virtualizing Branch/Return Instructions

Termination Conditions (EOB / EOI)

State	Meaning	Triggered by
EOB (End of Block)	Block ends here	Any branch, call, or return instruction
EOI (End of Input)	No valid instructions follow	Unconditional branch or return (not calls — callee returns)

`gum_exec_block_virtualize_branch_insn`

Handles: unconditional branches (B, BR), conditional branches (B.cond, CBZ, CBNZ, TBZ, TBNZ), and call instructions (BL, BLR, BLRAA, BLRAAZ, BLRAB, BLRABZ).

Conditional branch output pattern:

1INVERSE_CONDITION  is_false        ; e.g. CBZ → CBNZ
2  jmp_transfer_code(target, cond_entry_gate)
3is_false:
4  jmp_transfer_code(fallthrough_addr, cond_entry_gate)

Call instruction handling:

Emit call event (if configured)
Check for registered call probes → emit probe call code if any
If target in excluded range (immediate only): emit original call + jmp_transfer_code to re-enter Stalker at return address using excluded_call_imm gate
If target in register: emit runtime check against excluded ranges via gum_exec_block_check_address_for_exclusion()
Else: emit gum_exec_block_write_call_invoke_code():
- Emit entry gate call to instrument callee
- Call last_stack_push helper with real and instrumented return addresses
- Emit landing pad (initially re-enters Stalker; may be backpatched to direct branch)
- Branch to instrumented callee via exec_generated_code

`gum_exec_block_virtualize_ret_insn`

Emit return event (if configured)
Emit ret_transfer_code which loads return register into X16 and jumps to last_stack_pop_and_go

AArch64 Call Instructions (all update LR with return address)

BL, BLR, BLRAA, BLRAAZ, BLRAB, BLRABZ

Entry Gates

1#define GUM_ENTRYGATE(name) gum_exec_ctx_replace_current_block_from_##name
2#define GUM_DEFINE_ENTRYGATE(name)                                    \
3  static gpointer GUM_THUNK GUM_ENTRYGATE(name)(                      \
4      GumExecCtx *ctx, gpointer start_address) {                      \
5    if (counters_enabled) total_##name##s++;                          \
6    return gum_exec_ctx_replace_current_block_with(ctx, start_address);\
7  }

Defined entry gates:

Gate Name	Trigger
`call_imm`	Call to immediate address
`call_reg`	Call via register
`post_call_invoke`	Landing pad re-enter after call
`excluded_call_imm`	Call to excluded range (immediate)
`excluded_call_reg`	Call to excluded range (register)
`ret`	Return instruction (unexpected; slow path)
`jmp_imm`	Unconditional branch to immediate
`jmp_reg`	Unconditional branch via register
`jmp_cond_cc`	Conditional branch (`B.cond`)
`jmp_cond_cbz`	`CBZ`
`jmp_cond_cbnz`	`CBNZ`
`jmp_cond_tbz`	`TBZ`
`jmp_cond_tbnz`	`TBNZ`
`jmp_continuation`	Exhausted block continuation

Events

Event Types

Type	Constant	Description
Call	`GUM_CALL`	A call instruction was executed
Return	`GUM_RET`	A return instruction was executed
Execute	`GUM_EXEC`	A single instruction was executed
Block	`GUM_BLOCK`	A basic block was executed
Compile	`GUM_COMPILE`	A basic block was instrumented

Event Emitter Functions

gum_exec_ctx_emit_call_event()
gum_exec_ctx_emit_ret_event()
gum_exec_ctx_emit_exec_event()
gum_exec_ctx_emit_block_event()

Each calls gum_exec_block_write_unfollow_check_code() to embed an unfollow check.

Event Delivery

Events are queued; flushed periodically or manually (not per-event, to avoid JS runtime re-entry overhead)
onReceive(events): raw binary blob; parse with Stalker.parse(events) → array of tuples
onCallSummary(summary): aggregated { "0x1234": 42 } call counts (more efficient, less granular)

Call Probes

1const id = Stalker.addCallProbe(targetAddress, (args) => {
2  // args[0], args[1], ... — function arguments
3}, optionalData);
4
5Stalker.removeCallProbe(id);

Use when Interceptor.attach() fails inside Stalker (original function is never called; only instrumented copy runs)
Interceptor patches work only if applied before block instrumentation OR if trustThreshold causes re-instrumentation
Adding or removing a probe invalidates all cached instrumented blocks (forces full re-instrumentation)
optionalData only meaningful for C callbacks (e.g. via CModule); JS callbacks use closures instead

Backpatching (Optimization)

Deterministic branches can bypass Stalker entirely after first execution:

Branch Type	Optimization
Unconditional branch to immediate	Replace with direct branch to instrumented block
Conditional branch (both paths known)	Replace condition + direct branches to both instrumented targets
Indirect branch (`BR X0`) with stable target	Emit compare + direct branch if matches; fallback to Stalker if not

Controlled by trustThreshold. Landing pads start as Stalker re-entry code; backpatched to direct branches once trust is established.

Unfollow

1void gum_stalker_unfollow_me(GumStalker *self);
2void gum_stalker_unfollow(GumStalker *self, GumThreadId thread_id);

Unfollow Current Thread

Set ctx->state = GUM_EXEC_CTX_UNFOLLOW_PENDING
Each event emission calls gum_exec_block_write_unfollow_check_code() → gum_exec_ctx_maybe_unfollow() at runtime
gum_exec_ctx_maybe_unfollow() checks state; if pending and pending_calls == 0: calls gum_exec_ctx_unfollow() → sets resume_at, clears TLS context key, sets state to GUM_EXEC_CTX_DESTROY_PENDING
Special case: if the next block is gum_stalker_unfollow_me itself, gum_exec_ctx_replace_current_block_with() returns the original uninstrumented address — the thread exits Stalker without further instrumentation

Unfollow Another Thread

If not yet executed (infect_thunk still in PC): use gum_process_modify_thread() + gum_stalker_disinfect() to restore original PC
Otherwise: set state to GUM_EXEC_CTX_UNFOLLOW_PENDING and wait for thread to self-detect

Freeze/Thaw

On systems without RWX page support (W^X enforcement):

Thaw: mprotect pages to RW before writing instrumented code
Freeze: mprotect pages to RX before executing
On RWX-capable systems: these are no-ops

Miscellaneous

Exclusive Load/Store Handling

AArch64 exclusive load/store pairs (LDXR/STXR family) are used for atomic primitives (mutexes, semaphores). Inserting event instrumentation between them would break the exclusive monitor.

Solution: Track exclusive_load_offset in the iterator; suppress non-essential instrumentation for up to 4 instructions following an exclusive load.

 1// Exclusive load instructions reset the counter:
 2case ARM64_INS_LDAXR: case ARM64_INS_LDXR: /* ... */
 3  gc->exclusive_load_offset = 0;
 4
 5// Exclusive store instructions clear the guard:
 6case ARM64_INS_STXR: case ARM64_INS_STLXR: /* ... */
 7  gc->exclusive_load_offset = GUM_INSTRUCTION_OFFSET_NONE;
 8
 9// Instrumentation suppressed while inside exclusive window:
10if (gc->exclusive_load_offset == GUM_INSTRUCTION_OFFSET_NONE)
11  gum_exec_block_write_exec_event_code(...);

Exhausted Blocks

If fewer than GUM_EXEC_BLOCK_MIN_SIZE (1024) bytes remain in the slab tail, the iterator returns FALSE early. gum_exec_ctx_obtain_block_for() treats this as an implicit B <next_instruction> using the jmp_continuation entry gate — the block is split and the remainder becomes a new block in a new slab.

Syscall Virtualization (Linux/AArch64 only)

Handles SVC instruction for clone(2) syscall to prevent new threads inheriting Stalker instrumentation:

1// Pseudo-code for generated instrumentation:
2if x8 == __NR_clone:
3  x0 = do_original_syscall()
4  if x0 == 0:           // child thread
5    goto original_instruction_address  // exit Stalker; run uninstrumented
6  return x0             // parent thread: continue normally
7else:
8  return do_original_syscall()

AArch64 syscall convention: args in X0–X7, syscall number in X8, return value in X0.

Pointer Authentication (iOS ARMv8.3+)

PAC uses unused high bits of pointers to store cryptographic authentication codes:

1pacia lr, sp      ; sign LR using SP and key → LR'
2stp fp, lr, [sp, #-FRAME_SIZE]!
3; ...
4ldp fp, lr, [sp], #FRAME_SIZE
5autia lr, sp      ; verify LR'; fault if corrupted
6ret lr

When reading pointer registers (e.g., for indirect branch target or return address), Stalker must strip PAC before use:

1gum_arm64_writer_put_xpaci_reg(cw, reg);   // strip PAC from reg

Applies to: determining branch/return destinations, all indirect pointer reads from application registers.

Performance Notes

Slab ratio: 1/12 header, 11/12 tail — empirically tuned to balance GumExecBlock entries vs. instruction storage
Helper reachability: helpers written at most once per slab (if nearby slab’s helpers are within ±128 MB, reuse them); AArch64 direct branch range = ±128 MB
Landing pad optimization: call return fast-path avoids Stalker re-entry entirely (side-stack lookup + direct branch)
Backpatching: eliminates Stalker re-entry for deterministic branches entirely after trust is established
Event batching: events queued and bulk-delivered to avoid per-event JS runtime entry overhead
onCallSummary vs onReceive: onCallSummary aggregates server-side; much lower overhead when only call frequency matters
Counters: gum_stalker_dump_counters() prints per-gate transition counts; test-suite only, not part of public API