Stalker
Frida Stalker — Technical Reference
Source: frida-gum/gum/backend-arm64/gumstalker-arm64.c
Supported architectures: AArch64 (ARM64), Intel 64 (x86-64), IA-32
Primary use platforms: Android, iOS (AArch64); Linux, macOS, Windows (x86-64/IA-32)
Architecture Overview
Stalker operates on one basic block at a time:
- A block starting at
real_addressis read byGumArm64Relocator - An instrumented copy is written to a slab by
GumArm64Writer - Branch/return instructions at the end of the block are virtualized — replaced with code that re-enters Stalker via an entry gate
- The entry gate calls
gum_exec_ctx_replace_current_block_with()to instrument the next block - Previously instrumented blocks are cached in a hashtable keyed on
real_address; on cache hit, control jumps directly to the instrumented copy (subject totrustThreshold)
Prerequisite knowledge: Capstone disassembler (cs_insn), GumArm64Writer/GumArm64Relocator APIs, AArch64 calling conventions (AAPCS64), AArch64 Link Register (X30/LR) semantics.
Entry Points
| Function | Description |
|---|---|
gum_stalker_follow_me(self, transformer, sink) | Follow the current thread; uses LR to find the start address |
gum_stalker_follow(self, thread_id, transformer, sink) | Follow another thread; uses ptrace/gum_process_modify_thread() to inject |
gum_exec_ctx_replace_current_block_with(ctx, start_address) | Re-enter Stalker to instrument the next block (called by entry gates) |
gum_stalker_follow_me — Assembly Bootstrap (AArch64)
1gum_stalker_follow_me:
2 stp x29, x30, [sp, -16]! ; save FP and LR
3 mov x29, sp
4 mov x3, x30 ; pass original LR as 4th arg (return address)
5 bl _gum_stalker_do_follow_me
6 ldp x29, x30, [sp], 16
7 br x0 ; branch to instrumented entry point returned in X0
_gum_stalker_do_follow_me(self, transformer, sink, ret_addr)— initializesGumExecCtx, instruments first block, returns itscode_address- AArch64 args in X0–X7; return value in X0
Following Another Thread (gum_stalker_follow)
1void gum_stalker_follow(GumStalker *self, GumThreadId thread_id,
2 GumStalkerTransformer *transformer, GumEventSink *sink);
- If
thread_id == current_thread_id, delegates togum_stalker_follow_me() - Otherwise calls
gum_process_modify_thread(thread_id, gum_stalker_infect, &ctx) - Linux/Android: uses
ptracevia a cloned helper process in its own process group (workaround: cannot ptrace own process group); communicates via UNIX socket; respectsPR_SET_DUMPABLE/PR_SET_PTRACER gum_stalker_infect()writes code into the target thread’s context usingGumArm64Writerinstead of calling functions directly
CPU Context Structure (AArch64)
1typedef GumArm64CpuContext GumCpuContext;
2
3struct _GumArm64CpuContext {
4 guint64 pc;
5 guint64 sp; /* X31 */
6 guint64 x[29];
7 guint64 fp; /* X29 — frame pointer */
8 guint64 lr; /* X30 */
9 guint8 q[128]; /* FPU/NEON/CRYPTO (SIMD) registers */
10};
JavaScript API
1Stalker.follow([threadId, options]) // start stalking threadId (or current thread)
2Stalker.unfollow([threadId])
3Stalker.exclude(range) // { base, size } — exclude a memory range
4Stalker.parse(events) // parse raw binary event buffer → JS array of tuples
5Stalker.addCallProbe(address, callback[, data]) // add probe for a function address
6Stalker.removeCallProbe(id)
7Stalker.trustThreshold // integer property (default: 1)
Options Object
1{
2 events: {
3 call: false, // emit GUM_CALL event on each call instruction
4 ret: false, // emit GUM_RET event on each return
5 exec: false, // emit GUM_EXEC event on each instruction
6 block: false, // emit GUM_BLOCK event when a block is executed
7 compile: false, // emit GUM_COMPILE event when a block is instrumented
8 },
9 onReceive(events) {}, // raw binary blob, parse with Stalker.parse()
10 onCallSummary(summary) {}, // aggregated { address: callCount } map (more efficient)
11 transform(iterator) {}, // custom transformer (see Transformer section)
12 data: ptr("0x...") // user data passed to C transformer/callout
13}
Configuration Options
trustThreshold
Controls how many times a block must execute unchanged before it is re-used without re-comparison.
| Value | Behavior |
|---|---|
-1 | Never trust; always re-instrument (slowest) |
0 | Trust immediately from first execution |
N (default: 1) | Trust after N consecutive executions with identical bytes |
- Even at
-1, the original code snapshot is still stored in the slab (retained for simplicity) - When set to
-1: slaboffsetis reset to 0 on new block allocation, overwriting all previous instrumented blocks in the slab
Stalker.exclude(range)
1Stalker.exclude({ base: Module.getBaseAddress('libc.so'), size: module.size });
- Prevents instrumentation of code in the given range
- When a call enters an excluded range: stalking stops until the call returns; callbacks into non-excluded ranges from within the excluded range are not captured
- Tracked via
pending_callscounter; unfollow is deferred whilepending_calls > 0
Memory: Slabs
Instrumented code is stored in 4 MB slabs (GUM_CODE_SLAB_MAX_SIZE = 4 * 1024 * 1024).
1struct _GumSlab {
2 guint8 *data; // tail start (after header)
3 guint offset; // current write position in tail
4 guint size; // usable tail size
5 GumSlab *next; // singly-linked list of slabs
6 guint num_blocks;
7 GumExecBlock blocks[]; // zero-length array; actual entries in header region
8};
- Header region:
slab_size / 12bytes (page-aligned) — storesGumSlab+GumExecBlock[] - Tail region: remaining ~3.67 MB — stores instrumented instructions inline
- Allocated with
GUM_PAGE_RWXif supported; otherwiseGUM_PAGE_RWand toggled via freeze/thaw - Slabs are chained via
nextpointer for disposal
Block Allocation
1#define GUM_EXEC_BLOCK_MIN_SIZE 1024 // minimum bytes required before allocating new slab
- New block only allocated if
>= 1024bytes remain in tail ANDnum_blocks < slab_max_blocks - If
trust_threshold < 0: resetslab->offset = 0(overwrite mode) instead of allocating new slab - On new slab: calls
gum_exec_ctx_ensure_inline_helpers_reachable()to write or re-use helpers
GumExecBlock Fields
1struct _GumExecBlock {
2 GumExecCtx *ctx;
3 GumSlab *slab;
4 guint8 *real_begin; // start of original code
5 guint8 *real_end; // end of original code
6 guint8 *real_snapshot; // copy of original bytes (= code_end; for trust comparison)
7 guint8 *code_begin; // start of instrumented copy
8 guint8 *code_end; // end of instrumented copy
9 GumExecBlockFlags flags;
10 gint recycle_count; // trust threshold counter
11};
Layout in slab tail per block: [instrumented code][original snapshot][BRK #14 debug marker]
Helpers (Inline Code Fragments)
Six helper functions are written into each slab (or reused from a nearby slab within ±128 MB):
| Helper | Function |
|---|---|
last_prolog_minimal | Save caller-saved registers (minimal context) |
last_epilog_minimal | Restore caller-saved registers |
last_prolog_full | Save all registers matching GumArm64CpuContext layout |
last_epilog_full | Restore all registers |
last_stack_push | Push GumExecFrame onto side-stack |
last_stack_pop_and_go | Pop frame and branch to instrumented return target |
Helpers are called with direct BL (±128 MB range). If the new slab is beyond 128 MB from existing helpers, fresh copies are written into the new slab.
Context Save/Restore
Context Types
| Type | Registers Saved | When Used |
|---|---|---|
GUM_PROLOG_MINIMAL | X0–X18, X29, X30, Q0–Q7, NZCV flags | Default; all code paths that don’t need callout/probe visibility |
GUM_PROLOG_FULL | All registers matching GumArm64CpuContext | Required by Stalker.addCallProbe() and iterator.putCallout() |
Prologue Inline Stub (written at each instrumented block)
1// Written by gum_exec_ctx_write_prolog():
2stp x19, lr, [sp, -(16 + GUM_RED_ZONE_SIZE)]! // save X19 (scratch) and LR; skip redzone
3bl <last_prolog_minimal_or_full>
Red zone: 128-byte region below SP that a leaf function may use; prologue advances SP past it before touching the stack.
Epilogue Inline Stub
1// Written by gum_exec_ctx_write_epilog():
2bl <last_epilog_minimal_or_full>
3ldp x19, x20, [sp, (16 + GUM_RED_ZONE_SIZE)] // restore X19 and X20 (post-adjust)
- X20 is repurposed during prolog/epilog as a pointer to the saved context base
- In
GUM_PROLOG_FULL, modifications toGumArm64CpuContextby callouts are reflected back — the epilog writes updated X19/X20 from the context back to their stack slots before popping
Reading Registers from Saved Context
1// Emits code (does not read directly) to load source_register → target_register:
2gum_exec_ctx_load_real_register_into(ctx, target_reg, source_reg, gc);
- For
GUM_PROLOG_FULL: X20 points toGumArm64CpuContext; offsets match struct layout - For
GUM_PROLOG_MINIMAL: X20 points to stack frame; X0–X18 at(reg - X0) * 8; X19/X20 at(11*16 + 4*32) + (reg - X19) * 8; X29/X30 at(10*16) + (reg - X29) * 8
Frames (Side-Stack)
1struct _GumExecFrame {
2 gpointer real_address; // original return address
3 gpointer code_address; // instrumented landing pad address
4};
- One page allocated per
GumExecCtx; filled descending from end of page (512 entries max) ctx->first_frame= last entry (page end - sizeof frame)ctx->current_frame= most recently pushed frame
last_stack_push (pseudo-code)
1void last_stack_push_helper(gpointer real_address, gpointer code_address) {
2 GumExecFrame **x16 = &ctx->current_frame;
3 GumExecFrame *x17 = *x16;
4 if ((x17 & (page_size - 1)) != 0) { // not page-aligned = not exhausted
5 x17--;
6 x17->real_address = real_address;
7 x17->code_address = code_address;
8 *x16 = x17;
9 }
10 // if exhausted: silently discard (fall back to slow path on return)
11}
last_stack_pop_and_go (pseudo-code)
1// Called by virtualized RET; x16 = return register value
2void last_stack_pop_and_go_helper(gpointer x16) {
3 GumExecFrame **x0 = &ctx->current_frame;
4 GumExecFrame *x1 = *x0;
5 gpointer x17 = x1->real_address;
6 if (x17 == x16) { // fast path: expected return
7 x17 = x1->code_address; // go to instrumented landing pad
8 x1++;
9 *x0 = x1; // pop frame
10 goto x17;
11 } else { // slow path: unexpected return
12 *x0 = ctx->first_frame; // clear entire side-stack
13 ctx->return_at = x16;
14 minimal_prolog();
15 gum_exec_ctx_replace_current_block_from_ret(ctx, ctx->return_at);
16 minimal_epilog();
17 goto ctx->resume_at; // branch to newly instrumented block
18 }
19}
Transformer
1// Default transformer — passes all instructions through unchanged:
2static void gum_default_stalker_transformer_transform_block(
3 GumStalkerTransformer *transformer,
4 GumStalkerIterator *iterator,
5 GumStalkerOutput *output)
6{
7 while (gum_stalker_iterator_next(iterator, NULL))
8 gum_stalker_iterator_keep(iterator);
9}
Custom Transformer (JavaScript)
1Stalker.follow(threadId, {
2 transform(iterator) {
3 let instruction = iterator.next();
4 do {
5 if (instruction.mnemonic === 'bl') {
6 iterator.putCallout(onCall); // insert a callout before this instruction
7 }
8 iterator.keep(); // emit the instruction as-is
9 } while ((instruction = iterator.next()) !== null);
10 }
11});
12
13function onCall(context) {
14 // context is CpuContext — read/write registers
15 console.log('call to', context.pc);
16}
Callout Structure
1typedef void (* GumStalkerCallout)(GumCpuContext *cpu_context, gpointer user_data);
2
3struct _GumCalloutEntry {
4 GumStalkerCallout callout;
5 gpointer data;
6 GDestroyNotify data_destroy;
7 gpointer pc;
8 GumExecCtx *exec_context;
9};
- Callouts require
GUM_PROLOG_FULL— the full CPU context is saved so callouts can read and modify all registers
Virtualizing Branch/Return Instructions
Termination Conditions (EOB / EOI)
| State | Meaning | Triggered by |
|---|---|---|
| EOB (End of Block) | Block ends here | Any branch, call, or return instruction |
| EOI (End of Input) | No valid instructions follow | Unconditional branch or return (not calls — callee returns) |
gum_exec_block_virtualize_branch_insn
Handles: unconditional branches (B, BR), conditional branches (B.cond, CBZ, CBNZ, TBZ, TBNZ), and call instructions (BL, BLR, BLRAA, BLRAAZ, BLRAB, BLRABZ).
Conditional branch output pattern:
1INVERSE_CONDITION is_false ; e.g. CBZ → CBNZ
2 jmp_transfer_code(target, cond_entry_gate)
3is_false:
4 jmp_transfer_code(fallthrough_addr, cond_entry_gate)
Call instruction handling:
- Emit call event (if configured)
- Check for registered call probes → emit probe call code if any
- If target in excluded range (immediate only): emit original call +
jmp_transfer_codeto re-enter Stalker at return address usingexcluded_call_immgate - If target in register: emit runtime check against excluded ranges via
gum_exec_block_check_address_for_exclusion() - Else: emit
gum_exec_block_write_call_invoke_code():- Emit entry gate call to instrument callee
- Call
last_stack_pushhelper with real and instrumented return addresses - Emit landing pad (initially re-enters Stalker; may be backpatched to direct branch)
- Branch to instrumented callee via
exec_generated_code
gum_exec_block_virtualize_ret_insn
- Emit return event (if configured)
- Emit
ret_transfer_codewhich loads return register into X16 and jumps tolast_stack_pop_and_go
AArch64 Call Instructions (all update LR with return address)
BL, BLR, BLRAA, BLRAAZ, BLRAB, BLRABZ
Entry Gates
1#define GUM_ENTRYGATE(name) gum_exec_ctx_replace_current_block_from_##name
2#define GUM_DEFINE_ENTRYGATE(name) \
3 static gpointer GUM_THUNK GUM_ENTRYGATE(name)( \
4 GumExecCtx *ctx, gpointer start_address) { \
5 if (counters_enabled) total_##name##s++; \
6 return gum_exec_ctx_replace_current_block_with(ctx, start_address);\
7 }
Defined entry gates:
| Gate Name | Trigger |
|---|---|
call_imm | Call to immediate address |
call_reg | Call via register |
post_call_invoke | Landing pad re-enter after call |
excluded_call_imm | Call to excluded range (immediate) |
excluded_call_reg | Call to excluded range (register) |
ret | Return instruction (unexpected; slow path) |
jmp_imm | Unconditional branch to immediate |
jmp_reg | Unconditional branch via register |
jmp_cond_cc | Conditional branch (B.cond) |
jmp_cond_cbz | CBZ |
jmp_cond_cbnz | CBNZ |
jmp_cond_tbz | TBZ |
jmp_cond_tbnz | TBNZ |
jmp_continuation | Exhausted block continuation |
Events
Event Types
| Type | Constant | Description |
|---|---|---|
| Call | GUM_CALL | A call instruction was executed |
| Return | GUM_RET | A return instruction was executed |
| Execute | GUM_EXEC | A single instruction was executed |
| Block | GUM_BLOCK | A basic block was executed |
| Compile | GUM_COMPILE | A basic block was instrumented |
Event Emitter Functions
gum_exec_ctx_emit_call_event()gum_exec_ctx_emit_ret_event()gum_exec_ctx_emit_exec_event()gum_exec_ctx_emit_block_event()
Each calls gum_exec_block_write_unfollow_check_code() to embed an unfollow check.
Event Delivery
- Events are queued; flushed periodically or manually (not per-event, to avoid JS runtime re-entry overhead)
onReceive(events): raw binary blob; parse withStalker.parse(events)→ array of tuplesonCallSummary(summary): aggregated{ "0x1234": 42 }call counts (more efficient, less granular)
Call Probes
1const id = Stalker.addCallProbe(targetAddress, (args) => {
2 // args[0], args[1], ... — function arguments
3}, optionalData);
4
5Stalker.removeCallProbe(id);
- Use when
Interceptor.attach()fails inside Stalker (original function is never called; only instrumented copy runs) Interceptorpatches work only if applied before block instrumentation OR iftrustThresholdcauses re-instrumentation- Adding or removing a probe invalidates all cached instrumented blocks (forces full re-instrumentation)
optionalDataonly meaningful for C callbacks (e.g. viaCModule); JS callbacks use closures instead
Backpatching (Optimization)
Deterministic branches can bypass Stalker entirely after first execution:
| Branch Type | Optimization |
|---|---|
| Unconditional branch to immediate | Replace with direct branch to instrumented block |
| Conditional branch (both paths known) | Replace condition + direct branches to both instrumented targets |
Indirect branch (BR X0) with stable target | Emit compare + direct branch if matches; fallback to Stalker if not |
Controlled by trustThreshold. Landing pads start as Stalker re-entry code; backpatched to direct branches once trust is established.
Unfollow
1void gum_stalker_unfollow_me(GumStalker *self);
2void gum_stalker_unfollow(GumStalker *self, GumThreadId thread_id);
Unfollow Current Thread
- Set
ctx->state = GUM_EXEC_CTX_UNFOLLOW_PENDING - Each event emission calls
gum_exec_block_write_unfollow_check_code()→gum_exec_ctx_maybe_unfollow()at runtime gum_exec_ctx_maybe_unfollow()checks state; if pending andpending_calls == 0: callsgum_exec_ctx_unfollow()→ setsresume_at, clears TLS context key, sets state toGUM_EXEC_CTX_DESTROY_PENDING- Special case: if the next block is
gum_stalker_unfollow_meitself,gum_exec_ctx_replace_current_block_with()returns the original uninstrumented address — the thread exits Stalker without further instrumentation
Unfollow Another Thread
- If not yet executed (
infect_thunkstill in PC): usegum_process_modify_thread()+gum_stalker_disinfect()to restore original PC - Otherwise: set state to
GUM_EXEC_CTX_UNFOLLOW_PENDINGand wait for thread to self-detect
Freeze/Thaw
On systems without RWX page support (W^X enforcement):
- Thaw:
mprotectpages toRWbefore writing instrumented code - Freeze:
mprotectpages toRXbefore executing - On RWX-capable systems: these are no-ops
Miscellaneous
Exclusive Load/Store Handling
AArch64 exclusive load/store pairs (LDXR/STXR family) are used for atomic primitives (mutexes, semaphores). Inserting event instrumentation between them would break the exclusive monitor.
Solution: Track exclusive_load_offset in the iterator; suppress non-essential instrumentation for up to 4 instructions following an exclusive load.
1// Exclusive load instructions reset the counter:
2case ARM64_INS_LDAXR: case ARM64_INS_LDXR: /* ... */
3 gc->exclusive_load_offset = 0;
4
5// Exclusive store instructions clear the guard:
6case ARM64_INS_STXR: case ARM64_INS_STLXR: /* ... */
7 gc->exclusive_load_offset = GUM_INSTRUCTION_OFFSET_NONE;
8
9// Instrumentation suppressed while inside exclusive window:
10if (gc->exclusive_load_offset == GUM_INSTRUCTION_OFFSET_NONE)
11 gum_exec_block_write_exec_event_code(...);
Exhausted Blocks
If fewer than GUM_EXEC_BLOCK_MIN_SIZE (1024) bytes remain in the slab tail, the iterator returns FALSE early. gum_exec_ctx_obtain_block_for() treats this as an implicit B <next_instruction> using the jmp_continuation entry gate — the block is split and the remainder becomes a new block in a new slab.
Syscall Virtualization (Linux/AArch64 only)
Handles SVC instruction for clone(2) syscall to prevent new threads inheriting Stalker instrumentation:
1// Pseudo-code for generated instrumentation:
2if x8 == __NR_clone:
3 x0 = do_original_syscall()
4 if x0 == 0: // child thread
5 goto original_instruction_address // exit Stalker; run uninstrumented
6 return x0 // parent thread: continue normally
7else:
8 return do_original_syscall()
AArch64 syscall convention: args in X0–X7, syscall number in X8, return value in X0.
Pointer Authentication (iOS ARMv8.3+)
PAC uses unused high bits of pointers to store cryptographic authentication codes:
1pacia lr, sp ; sign LR using SP and key → LR'
2stp fp, lr, [sp, #-FRAME_SIZE]!
3; ...
4ldp fp, lr, [sp], #FRAME_SIZE
5autia lr, sp ; verify LR'; fault if corrupted
6ret lr
When reading pointer registers (e.g., for indirect branch target or return address), Stalker must strip PAC before use:
1gum_arm64_writer_put_xpaci_reg(cw, reg); // strip PAC from reg
Applies to: determining branch/return destinations, all indirect pointer reads from application registers.
Performance Notes
- Slab ratio: 1/12 header, 11/12 tail — empirically tuned to balance
GumExecBlockentries vs. instruction storage - Helper reachability: helpers written at most once per slab (if nearby slab’s helpers are within ±128 MB, reuse them); AArch64 direct branch range = ±128 MB
- Landing pad optimization: call return fast-path avoids Stalker re-entry entirely (side-stack lookup + direct branch)
- Backpatching: eliminates Stalker re-entry for deterministic branches entirely after trust is established
- Event batching: events queued and bulk-delivered to avoid per-event JS runtime entry overhead
onCallSummaryvsonReceive:onCallSummaryaggregates server-side; much lower overhead when only call frequency matters- Counters:
gum_stalker_dump_counters()prints per-gate transition counts; test-suite only, not part of public API