Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ABI & Calling Conventions

Register assignments, calling convention, stack frame layout, memory layout, and the SPI/JAM program format used by the WASM-to-PVM recompiler.

The canonical source for constants lives in crates/wasm-pvm/src/abi.rs and crates/wasm-pvm/src/memory_layout.rs.


Register Assignments

PVM provides 13 general-purpose 64-bit registers (r0–r12). The compiler assigns them as follows:

RegisterAliasPurposeSaved by
r0raReturn address (jump table index)Callee
r1spStack pointer (grows downward)Callee
r2t0Temp: load operand 1 / immediatesCaller
r3t1Temp: load operand 2Caller
r4t2Temp: ALU resultCaller
r5s0ScratchCaller
r6s1ScratchCaller
r7a0Return value / SPI args_ptrCaller
r8a1SPI args_len / second resultCaller
r9l0Local 0 / param 0Callee
r10l1Local 1 / param 1Callee
r11l2Local 2 / param 2Callee
r12l3Local 3 / param 3Callee

Callee-saved (r0, r1, r9–r12): the callee must preserve these across calls. Caller-saved (r2–r8): the caller must assume these are clobbered by any call.


Stack Frame Layout

Every function allocates a stack frame. The stack grows downward (SP decreases).

                Higher addresses
          ┌─────────────────────────┐
          │   caller's frame ...    │
old SP →  ├─────────────────────────┤
          │  Saved r0  (ra)    +0   │  8 bytes
          │  Saved r9  (l0)    +8   │  8 bytes
          │  Saved r10 (l1)   +16   │  8 bytes
          │  Saved r11 (l2)   +24   │  8 bytes
          │  Saved r12 (l3)   +32   │  8 bytes
          ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤  FRAME_HEADER_SIZE = 40
          │  SSA value slot 0  +40  │  8 bytes
          │  SSA value slot 1  +48  │  8 bytes
          │  ...                    │  8 bytes per SSA value
new SP →  ├─────────────────────────┤
          │  (operand spill area)   │  SP - 0x100 .. SP
          └─────────────────────────┘
                Lower addresses

Frame size = FRAME_HEADER_SIZE (40) + num_ssa_values * 8

The operand spill area at SP + OPERAND_SPILL_BASE (i.e. SP - 0x100) is used for temporary storage during phi-node copies and indirect calls. The frame grows upward from SP (toward higher addresses), while the spill area is below SP, so the two regions never overlap regardless of frame size. However, a callee’s frame allocation must not reach into the caller’s spill area — this is protected by the stack overflow check which ensures SP - frame_size >= stack_limit.

Stack-Slot Approach with Register Allocation

Every LLVM SSA value gets a dedicated 8-byte stack slot. The baseline instruction sequence is:

  1. Load operands from stack slots into temp registers (t0, t1)
  2. Execute ALU operation, result in t2
  3. Store t2 back to the result’s stack slot

A linear-scan register allocator (regalloc.rs) improves on this when a function contains loop back-edges; loop-free functions skip allocation entirely. Candidate intervals are built from use-def live-interval analysis and filtered by a minimum-use threshold (MIN_USES_FOR_ALLOCATION, currently 3), rather than requiring per-value “loop-spanning” as the eligibility rule. The allocator assigns eligible values to available callee-saved registers (r9-r12 when not used for this function’s incoming parameters). In non-leaf functions, r9+ needed for outgoing call arguments are reserved from allocation. Call-site clobber handling/reloads are performed by the emitter after calls, not by explicit call-site invalidation logic inside regalloc itself. Combined with the register cache, this eliminates most redundant memory traffic.

Per-Block Register Cache (Store-Load Forwarding)

PvmEmitter maintains a per-basic-block register cache (slot_cache: HashMap<i32, u8>, reg_to_slot: [Option<i32>; 13]) that tracks which stack slot values are currently live in registers. This eliminates redundant LoadIndU64 instructions:

  • Cache hit, same register: Skip entirely (0 instructions emitted)
  • Cache hit, different register: Emit AddImm64 dst, cached_reg, 0 (register copy)
  • Cache miss: Emit normal LoadIndU64, then record in cache

The cache is invalidated:

  • When a register is overwritten (auto-detected via Instruction::dest_reg())
  • At block boundaries (define_label() clears the entire cache)
  • After function calls (clear_reg_cache() after Fallthrough return points)
  • After ecalli host calls (clear_reg_cache() after Ecalli)

Impact: ~50% gas reduction, ~15-40% code size reduction across benchmarks.


Calling Convention

Parameter Passing

ParameterLocation
1st–4thr9–r12
5th+PARAM_OVERFLOW_BASE (0x32000 + (i-4)*8) in global memory

Return value: r7 (single i64).

Caller Sequence

1. Load arguments into r9–r12 (first 4)
2. Store overflow arguments to PARAM_OVERFLOW_BASE
3. LoadImm64  r0, <return_jump_table_index>
4. Jump       <callee_code_offset>
   ── callee executes ──
5. (fallthrough) Store r7 to result slot if function returns a value

Callee Prologue

1. Stack overflow check (skipped for entry function):
     LoadImm64  t1, stack_limit        ; unsigned comparison!
     AddImm64   t2, sp, -frame_size
     BranchGeU  t1, t2, continue
     Trap                              ; stack overflow → panic
2. Allocate frame:
     AddImm64   sp, sp, -frame_size
3. Save callee-saved registers:
     StoreIndU64  [sp+0],  r0
     StoreIndU64  [sp+8],  r9
     StoreIndU64  [sp+16], r10
     StoreIndU64  [sp+24], r11
     StoreIndU64  [sp+32], r12
4. Copy parameters to SSA value slots:
     - First 4 from r9–r12
     - 5th+ loaded from PARAM_OVERFLOW_BASE

Callee Epilogue (return)

1. Load return value into r7 (if returning a value)
2. Restore callee-saved registers:
     LoadIndU64  r9,  [sp+8]
     LoadIndU64  r10, [sp+16]
     LoadIndU64  r11, [sp+24]
     LoadIndU64  r12, [sp+32]
3. Restore return address:
     LoadIndU64  r0, [sp+0]
4. Deallocate frame:
     AddImm64   sp, sp, +frame_size
5. Return:
     JumpInd    r0, 0

Jump Table & Return Addresses

PVM’s JUMP_IND instruction uses a jump table — it is not a direct address jump:

JUMP_IND rA, offset
  target_address = jumpTable[(rA + offset) / 2 - 1]

Return addresses stored in r0 are therefore jump-table indices, not code offsets:

r0 = (jump_table_index + 1) * 2

The jump table is laid out as:

[ return_addr_0, return_addr_1, ...,   // for call return sites
  func_0_entry,  func_1_entry,  ... ]  // for indirect calls

Each entry is a 4-byte code offset (u32). Jump table entries for call_indirect encode function entry points used by the dispatch table.


Indirect Calls (call_indirect)

A dispatch table at RO_DATA_BASE (0x10000) maps WASM table indices to function entry points:

Dispatch table entry (8 bytes each):
  [0–3]  Jump address (u32, byte offset → jump table index)
  [4–7]  Type signature index (u32)

The indirect call sequence:

 1. Compute dispatch_addr = RO_DATA_BASE + (table_index << 3)
 2. Load type_idx from [dispatch_addr + 4]
 3. Compare type_idx with expected_type_idx
 4. Trap if mismatch (signature validation)
 5. Load jump_addr from [dispatch_addr + 0]
 6. LoadImmJumpInd  jump_addr, r0, <return_jump_table_index>, 0

Import Calls

host_call_N(ecalli_index, r7, ..., r7+N-1) -> i64ecalli

A family of typed host call imports where N (0–6) indicates the number of data arguments loaded into r7–r12. The first argument must be a compile-time constant (the ecalli index). All variants return r7 as an i64.

ImportParamsRegisters set
host_call_0(i64)none
host_call_1(i64 i64)r7
host_call_2(i64 i64 i64)r7-r8
host_call_3(i64 i64 i64 i64)r7-r9
host_call_4(i64 i64 i64 i64 i64)r7-r10
host_call_5(i64 i64 i64 i64 i64 i64)r7-r11
host_call_6(i64 i64 i64 i64 i64 i64 i64)r7-r12

Example — JIP-1 log call with 5 register args:

(import "env" "host_call_5" (func $host_call_5 (param i64 i64 i64 i64 i64 i64) (result i64)))
(import "env" "pvm_ptr" (func $pvm_ptr (param i64) (result i64)))

;; ecalli 100 = log; r7=level, r8=target_ptr, r9=target_len, r10=msg_ptr, r11=msg_len
(drop (call $host_call_5
  (i64.const 100)                                  ;; ecalli index
  (i64.const 3)                                    ;; r7: log level
  (call $pvm_ptr (i64.const 0))                    ;; r8: target PVM pointer
  (i64.const 8)                                    ;; r9: target length
  (call $pvm_ptr (i64.const 8))                    ;; r10: message PVM pointer
  (i64.const 15)))                                 ;; r11: message length

host_call_Nb — two-register output variants

Same as host_call_N but also captures r8 after the ecalli to a dedicated stack slot (R8_CAPTURE_SLOT_OFFSET relative to SP). Use the companion import host_call_r8() -> i64 (no arguments) to retrieve the captured value. The host_call_r8 call must be in the same function as the preceding host_call_Nb.

All *b variants (host_call_0b through host_call_6b) are supported.

Example:

(import "env" "host_call_2b" (func $host_call_2b (param i64 i64 i64) (result i64)))
(import "env" "host_call_r8" (func $host_call_r8 (result i64)))

;; Call ecalli 10, passing r7=100 and r8=200.
;; Store r7 return value, then retrieve r8.
(local $r7 i64)
(local $r8 i64)
(local.set $r7 (call $host_call_2b (i64.const 10) (i64.const 100) (i64.const 200)))
(local.set $r8 (call $host_call_r8))

pvm_ptr(wasm_addr) -> pvm_addr

Converts a WASM-space address to a PVM-space address by zero-extending to 64 bits and adding wasm_memory_base.

Other imports

The abort import emits Trap (unrecoverable error). All other unresolved imports cause a compilation error — they must be resolved via --imports or --adapter before compilation succeeds.


Memory Layout

PVM Address Space:
  0x00000 - 0x0FFFF   Reserved / guard (fault on access)
  0x10000 - 0x1FFFF   Read-only data (RO_DATA_BASE) — dispatch tables
  0x20000 - 0x2FFFF   Gap zone (unmapped, guard between RO and RW)
  0x30000 - 0x31FFF   Globals window (GLOBAL_MEMORY_BASE, 8KB cap; actual bytes used = globals_region_size(...))
  0x32000 - 0x320FF   Parameter overflow area (5th+ function arguments)
  0x32100+            Spilled locals (per-function metadata, typically unused)
  0x33000+             WASM linear memory (4KB-aligned, computed dynamically via `compute_wasm_memory_base`)
  ...                  (unmapped gap until stack)
  0xFEFE0000           STACK_SEGMENT_END (initial SP)
  0xFEFF0000           Arguments segment (input data, read-only)
  0xFFFF0000           EXIT_ADDRESS (jump here → HALT)

Key formulas (see memory_layout.rs):

  • Global address: 0x30000 + global_index * 4
  • Memory size global: 0x30000 + num_globals * 4
  • Spilled local: 0x32100 + func_idx * SPILLED_LOCALS_PER_FUNC + local_offset
  • WASM memory base: align_up(max(SPILLED_LOCALS_BASE + num_funcs * SPILLED_LOCALS_PER_FUNC, GLOBAL_MEMORY_BASE + globals_region_size(num_globals, num_passive_segments)), 4KB) — the heap starts immediately after the globals/passive-length region, aligned to PVM page size (4KB). This is typically 0x33000 for programs with few globals.
  • Stack limit: 0xFEFE0000 - stack_size

RW data layout

SPI rw_data is defined as a contiguous dump of every byte from GLOBAL_MEMORY_BASE up to the last initialized byte of the WASM heap; the loader memcpys this region at 0x30000, so there is no sparse encoding or per-segment offsets inside the blob. That is why the zero stretch between the globals window and the first non-zero heap byte is encoded verbatim instead of being skipped.

build_rw_data() trims trailing zero bytes before SPI encoding. Heap pages are zero-initialized, so omitted trailing zeros are semantically equivalent.


Entry Function (SPI Convention)

The entry function is special — it follows SPI conventions rather than the normal calling convention.

Initial register state (set by the PVM runtime):

RegisterValuePurpose
r00xFFFF0000EXIT address — jump here to HALT
r10xFEFE0000Stack pointer (STACK_SEGMENT_END)
r70xFEFF0000Arguments pointer (PVM address)
r8args.lengthArguments length in bytes
r2–r6, r9–r120Available

Entry prologue differences from a normal function:

  1. No stack overflow check (main function starts with full stack)
  2. Allocates frame and stores SSA slots
  3. No callee-saved register saves (no caller to return to)
  4. Adjusts args_ptr: r7 = r7 - wasm_memory_base (convert PVM address to WASM address)
  5. Stores r7 and r8 to parameter slots

Entry return — unified packed i64 convention:

The entry function must return a single i64 value encoding a pointer and length:

  • Lower 32 bits = WASM pointer to result data
  • Upper 32 bits = result length in bytes
  • PVM output: r7 = (ret & 0xFFFFFFFF) + wasm_memory_base, r8 = r7 + (ret >> 32)

All entry functions end by jumping to EXIT_ADDRESS (0xFFFF0000).

Start Function

If a WASM start function exists, the entry function calls it before processing arguments. r7/r8 are saved to the stack, the start function is called (no arguments), then r7/r8 are restored.


SPI/JAM Program Format

The compiled output is a JAM file in the SPI (Standard Program Interface) format:

Offset  Size    Field
──────  ──────  ─────────────────────
0       3       ro_data_len (u24 LE)
3       3       rw_data_len (u24 LE)
6       2       heap_pages  (u16 LE)
8       3       stack_size  (u24 LE)
11      N       ro_data     (dispatch table)
11+N    M       rw_data     (globals + WASM memory initial data)
11+N+M  4       code_len    (u32 LE)
15+N+M  K       code        (PVM program blob)

heap_pages is computed from the WASM module’s initial_pages (not max_pages). It represents the number of 4KB PVM pages pre-allocated as zero-initialized writable memory at program start. Additional memory beyond this is allocated on demand via sbrk/memory.grow. Programs declaring (memory 0) get a minimum of 16 WASM pages (1MB) to accommodate AssemblyScript runtime memory accesses.

PVM Code Blob

Inside the code section, the PVM blob format is:

- jump_table_len  (varint u32)
- item_len        (u8, always 4)
- code_len        (varint u32)
- jump_table      (4 bytes per entry, code offsets)
- instructions    (PVM bytecode)
- mask            (bit-packed instruction start markers)

Entry Header

The first 10 bytes of code are the entry header:

[0–4]   Jump  <main_function_offset>        (5 bytes)
[5–9]   Jump  <secondary_entry_offset>      (5 bytes, or Trap + padding)

The secondary entry is for future use (e.g. is_authorized). If unused, it emits Trap followed by 4 Fallthrough instructions as padding.


Phi Node Handling

Phi nodes (SSA merge points) use a two-pass approach to avoid clobbering:

  1. Load pass: Load all incoming phi values into temp registers (t0, t1, t2, s0, s1)
  2. Store pass: Store all temps to their destination phi result slots

This supports up to 5 simultaneous phi values. The two-pass design prevents cycles where storing one phi value would overwrite a source needed by another phi.


Design Trade-offs

DecisionRationale
Stack-slot for every SSA valueCorrectness-first baseline; linear-scan register allocator (for loop-containing functions) assigns high-use values to available callee-saved regs (r9-r12 when not used for this function’s incoming parameters), and per-block register cache eliminates most remaining redundant loads
Spill area below SPFrame grows up from SP, spill area grows down — no overlap
Global PARAM_OVERFLOW_BASEAvoids stack frame complexity for overflow params
Jump-table indices as return addressesRequired by PVM’s JUMP_IND semantics
Entry function has no stack checkStarts with full stack, nothing to overflow into
Unsigned stack limit comparisonLoadImm64 avoids sign-extension bugs with large addresses
unsafe forbiddenWorkspace-level deny(unsafe_code) lint

References

  • crates/wasm-pvm/src/abi.rs — Register and frame constants
  • crates/wasm-pvm/src/memory_layout.rs — Memory address constants
  • crates/wasm-pvm/src/llvm_backend/emitter.rs — PvmEmitter and value management
  • crates/wasm-pvm/src/llvm_backend/calls.rs — Calling convention implementation
  • crates/wasm-pvm/src/llvm_backend/control_flow.rs — Prologue/epilogue/return
  • crates/wasm-pvm/src/spi.rs — JAM/SPI format encoder
  • Technical Reference — Technical reference and debugging journal
  • Gray Paper — JAM/PVM specification