Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ABI & Calling Conventions

Register assignments, calling convention, stack frame layout, memory layout, and the SPI/JAM program format used by the WASM-to-PVM recompiler.

The canonical source for constants lives in crates/wasm-pvm/src/abi.rs and crates/wasm-pvm/src/memory_layout.rs.


Register Assignments

PVM provides 13 general-purpose 64-bit registers (r0–r12). The compiler assigns them as follows:

RegisterAliasPurposeSaved by
r0raReturn address (jump table index)Callee
r1spStack pointer (grows downward)Callee
r2t0Temp: load operand 1 / immediatesCaller
r3t1Temp: load operand 2Caller
r4t2Temp: ALU resultCaller
r5s0ScratchCaller
r6s1ScratchCaller
r7a0Return value / SPI args_ptrCaller
r8a1SPI args_len / second resultCaller
r9l0Local 0 / param 0Callee
r10l1Local 1 / param 1Callee
r11l2Local 2 / param 2Callee
r12l3Local 3 / param 3Callee

Callee-saved (r0, r1, r9–r12): the callee must preserve these across calls. Caller-saved (r2–r8): the caller must assume these are clobbered by any call.


Stack Frame Layout

Every function allocates a stack frame. The stack grows downward (SP decreases).

                Higher addresses
          ┌─────────────────────────┐
          │   caller's frame ...    │
old SP →  ├─────────────────────────┤
          │  Saved r0  (ra)    +0   │  8 bytes
          │  Saved r9  (l0)    +8   │  8 bytes
          │  Saved r10 (l1)   +16   │  8 bytes
          │  Saved r11 (l2)   +24   │  8 bytes
          │  Saved r12 (l3)   +32   │  8 bytes
          ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤  FRAME_HEADER_SIZE = 40
          │  SSA value slot 0  +40  │  8 bytes
          │  SSA value slot 1  +48  │  8 bytes
          │  ...                    │  8 bytes per SSA value
new SP →  ├─────────────────────────┤
          │  (operand spill area)   │  SP - 0x100 .. SP
          └─────────────────────────┘
                Lower addresses

Frame size = FRAME_HEADER_SIZE (40) + num_ssa_values * 8

The operand spill area at SP + OPERAND_SPILL_BASE (i.e. SP - 0x100) is used for temporary storage during phi-node copies and indirect calls. The frame grows upward from SP (toward higher addresses), while the spill area is below SP, so the two regions never overlap regardless of frame size. However, a callee’s frame allocation must not reach into the caller’s spill area — this is protected by the stack overflow check which ensures SP - frame_size >= stack_limit.

Stack-Slot Approach with Register Allocation

Every LLVM SSA value gets a dedicated 8-byte stack slot. The baseline instruction sequence is:

  1. Load operands from stack slots into temp registers (t0, t1)
  2. Execute ALU operation, result in t2
  3. Store t2 back to the result’s stack slot

A linear-scan register allocator (regalloc.rs) improves on this when a function contains loop back-edges; loop-free functions skip allocation entirely. Candidate intervals are built from use-def live-interval analysis and filtered by a minimum-use threshold (MIN_USES_FOR_ALLOCATION, currently 3), rather than requiring per-value “loop-spanning” as the eligibility rule. The allocator assigns eligible values to available callee-saved registers (r9-r12 when not used for this function’s incoming parameters). In non-leaf functions, r9+ needed for outgoing call arguments are reserved from allocation. Call-site clobber handling/reloads are performed by the emitter after calls, not by explicit call-site invalidation logic inside regalloc itself. Combined with the register cache, this eliminates most redundant memory traffic.

Per-Block Register Cache (Store-Load Forwarding)

PvmEmitter maintains a per-basic-block register cache (slot_cache: HashMap<i32, u8>, reg_to_slot: [Option<i32>; 13]) that tracks which stack slot values are currently live in registers. This eliminates redundant LoadIndU64 instructions:

  • Cache hit, same register: Skip entirely (0 instructions emitted)
  • Cache hit, different register: Emit AddImm64 dst, cached_reg, 0 (register copy)
  • Cache miss: Emit normal LoadIndU64, then record in cache

The cache is invalidated:

  • When a register is overwritten (auto-detected via Instruction::dest_reg())
  • At block boundaries (define_label() clears the entire cache)
  • After function calls (clear_reg_cache() after Fallthrough return points)
  • After ecalli host calls (clear_reg_cache() after Ecalli)

Impact: ~50% gas reduction, ~15-40% code size reduction across benchmarks.


Calling Convention

Parameter Passing

ParameterLocation
1st–4thr9–r12
5th+param_overflow_base + (i-4)*8 in global memory (dynamic)

The param-overflow base is computed per-module by compute_param_overflow_base (see memory_layout.rs). It sits right after the globals/passive-length region, 8-byte aligned. The complementary helper compute_wasm_memory_base returns the start of WASM linear memory, which lands immediately after the overflow reservation when one is present. The 256-byte reservation is only emitted when any module type signature — local function or call_indirect target — has more than MAX_LOCAL_REGS params, tracked via WasmModule::needs_param_overflow. For a typical AS program the base lands around 0x300100x30020; the old fixed 0x32000 location is gone.

Return value: r7 (single i64).

Caller Sequence

1. Load arguments into r9–r12 (first 4)
2. Store overflow arguments to PARAM_OVERFLOW_BASE
3. LoadImm64  r0, <return_jump_table_index>
4. Jump       <callee_code_offset>
   ── callee executes ──
5. (fallthrough) Store r7 to result slot if function returns a value

Callee Prologue

1. Stack overflow check (skipped for entry function):
     LoadImm64  t1, stack_limit        ; unsigned comparison!
     AddImm64   t2, sp, -frame_size
     BranchGeU  t1, t2, continue
     Trap                              ; stack overflow → panic
2. Allocate frame:
     AddImm64   sp, sp, -frame_size
3. Save callee-saved registers:
     StoreIndU64  [sp+0],  r0
     StoreIndU64  [sp+8],  r9
     StoreIndU64  [sp+16], r10
     StoreIndU64  [sp+24], r11
     StoreIndU64  [sp+32], r12
4. Copy parameters to SSA value slots:
     - First 4 from r9–r12
     - 5th+ loaded from PARAM_OVERFLOW_BASE

Callee Epilogue (return)

1. Load return value into r7 (if returning a value)
2. Restore callee-saved registers:
     LoadIndU64  r9,  [sp+8]
     LoadIndU64  r10, [sp+16]
     LoadIndU64  r11, [sp+24]
     LoadIndU64  r12, [sp+32]
3. Restore return address:
     LoadIndU64  r0, [sp+0]
4. Deallocate frame:
     AddImm64   sp, sp, +frame_size
5. Return:
     JumpInd    r0, 0

Jump Table & Return Addresses

PVM’s JUMP_IND instruction uses a jump table — it is not a direct address jump:

JUMP_IND rA, offset
  target_address = jumpTable[(rA + offset) / 2 - 1]

Return addresses stored in r0 are therefore jump-table indices, not code offsets:

r0 = (jump_table_index + 1) * 2

The jump table is laid out as:

[ return_addr_0, return_addr_1, ...,   // for call return sites
  func_0_entry,  func_1_entry,  ... ]  // for indirect calls

Each entry is a 4-byte code offset (u32). Jump table entries for call_indirect encode function entry points used by the dispatch table.


Indirect Calls (call_indirect)

A dispatch table at RO_DATA_BASE (0x10000) maps WASM table indices to function entry points:

Dispatch table entry (8 bytes each):
  [0–3]  Jump address (u32, byte offset → jump table index)
  [4–7]  Type signature index (u32)

The indirect call sequence:

 1. Compute dispatch_addr = RO_DATA_BASE + (table_index << 3)
 2. Load type_idx from [dispatch_addr + 4]
 3. Compare type_idx with expected_type_idx
 4. Trap if mismatch (signature validation)
 5. Load jump_addr from [dispatch_addr + 0]
 6. LoadImmJumpInd  jump_addr, r0, <return_jump_table_index>, 0

Import Calls

host_call_N(ecalli_index, r7, ..., r7+N-1) -> i64ecalli

A family of typed host call imports where N (0–6) indicates the number of data arguments loaded into r7–r12. The first argument must be a compile-time constant (the ecalli index). All variants return r7 as an i64.

ImportParamsRegisters set
host_call_0(i64)none
host_call_1(i64 i64)r7
host_call_2(i64 i64 i64)r7-r8
host_call_3(i64 i64 i64 i64)r7-r9
host_call_4(i64 i64 i64 i64 i64)r7-r10
host_call_5(i64 i64 i64 i64 i64 i64)r7-r11
host_call_6(i64 i64 i64 i64 i64 i64 i64)r7-r12

Example — JIP-1 log call with 5 register args:

(import "env" "host_call_5" (func $host_call_5 (param i64 i64 i64 i64 i64 i64) (result i64)))
(import "env" "pvm_ptr" (func $pvm_ptr (param i64) (result i64)))

;; ecalli 100 = log; r7=level, r8=target_ptr, r9=target_len, r10=msg_ptr, r11=msg_len
(drop (call $host_call_5
  (i64.const 100)                                  ;; ecalli index
  (i64.const 3)                                    ;; r7: log level
  (call $pvm_ptr (i64.const 0))                    ;; r8: target PVM pointer
  (i64.const 8)                                    ;; r9: target length
  (call $pvm_ptr (i64.const 8))                    ;; r10: message PVM pointer
  (i64.const 15)))                                 ;; r11: message length

host_call_Nb — two-register output variants

Same as host_call_N but also captures r8 after the ecalli to a dedicated stack slot (R8_CAPTURE_SLOT_OFFSET relative to SP). Use the companion import host_call_r8() -> i64 (no arguments) to retrieve the captured value. The host_call_r8 call must be in the same function as the preceding host_call_Nb.

All *b variants (host_call_0b through host_call_6b) are supported.

Example:

(import "env" "host_call_2b" (func $host_call_2b (param i64 i64 i64) (result i64)))
(import "env" "host_call_r8" (func $host_call_r8 (result i64)))

;; Call ecalli 10, passing r7=100 and r8=200.
;; Store r7 return value, then retrieve r8.
(local $r7 i64)
(local $r8 i64)
(local.set $r7 (call $host_call_2b (i64.const 10) (i64.const 100) (i64.const 200)))
(local.set $r8 (call $host_call_r8))

pvm_ptr(wasm_addr) -> pvm_addr

Converts a WASM-space address to a PVM-space address by zero-extending to 64 bits and adding wasm_memory_base.

Other imports

The abort import emits Trap (unrecoverable error). All other unresolved imports cause a compilation error — they must be resolved via --imports or --adapter before compilation succeeds.


Memory Layout

PVM Address Space:
  0x00000 - 0x0FFFF   Reserved / guard (fault on access)
  0x10000 - 0x1FFFF   Read-only data (RO_DATA_BASE) — dispatch tables
  0x20000 - 0x2FFFF   Gap zone (unmapped, guard between RO and RW)
  0x30000             Mem-size slot (4 bytes, only when memory.size/grow/init used)
  0x30000 / 0x30004+  User globals (per-global width: 4 B for i32/f32, 8 B for i64/f64,
                      packed in declaration order; offset by 4 when mem-size slot present)
  after globals       Passive data segment length slots (4 bytes each)
  after lengths       Parameter overflow area (256 bytes, 8-byte aligned, only when any module type signature has >`MAX_LOCAL_REGS` params — covers both local functions and `call_indirect` targets)
  region_end          WASM linear memory (sits immediately after last region — no 4KB alignment)
  ...                 (unmapped gap until stack)
  0xFEFE0000          STACK_SEGMENT_END (initial SP)
  0xFEFF0000          Arguments segment (input data, read-only)
  0xFFFF0000          EXIT_ADDRESS (jump here → HALT)

Key formulas (see memory_layout.rs):

  • Memory-size slot: 0x30000 — stable position, independent of num_globals. Emitted only when the module uses memory.size/memory.grow/memory.init.
  • Global address: precomputed at parse time as WasmModule::global_offsets[idx]. Each user global occupies global_storage_width(type) bytes — 4 B for i32/f32, 8 B for i64/f64 — packed in declaration order with no inter-global padding. (global i64 ...) round-trips through LoadU64/StoreU64 without truncation; (global i32 ...) keeps its 4-byte slot and uses LoadU32/StoreU32. The LLVM frontend declares each global with its matching int type (i32/i64) and zext/truncs at global.get/global.set so the i64 WASM stack representation stays uniform.
  • Passive segment length slot: 0x30000 + (has_mem_size ? 4 : 0) + sum(global_widths) + ordinal * 4 (lengths remain 4 bytes — they’re effective sizes, never i64).
  • WASM memory base: compute_wasm_memory_base(num_globals, num_passive_segments, has_mem_size_global, needs_param_overflow). Sits immediately after the last present region with no 4KB alignment — anan-as page-aligns the rw_data tail (heapZerosStart = heapStart + alignToPageSize(rwLength)) separately, so the base can land at any byte offset. When every region is empty (no globals, no mem-size, no passive, no overflow), the base collapses to GLOBAL_MEMORY_BASE itself.
  • Stack limit: 0xFEFE0000 - stack_size

RW data layout

SPI rw_data is defined as a contiguous dump of every byte from GLOBAL_MEMORY_BASE up to the last initialized byte of the WASM heap; the loader memcpys this region at 0x30000, so there is no sparse encoding or per-segment offsets inside the blob. Because wasm_memory_base is placed tightly after the globals window (no 4KB alignment), the data-segment bytes start within a few bytes of rw_data[0] — the 4KB structural-padding page that the previous layout required for every memory-using program is eliminated. The compiler still trims trailing zeros before encoding.

build_rw_data() trims trailing zero bytes before SPI encoding. Heap pages are zero-initialized, so omitted trailing zeros are semantically equivalent.


Entry Function (SPI Convention)

The entry function is special — it follows SPI conventions rather than the normal calling convention.

Initial register state (set by the PVM runtime):

RegisterValuePurpose
r00xFFFF0000EXIT address — jump here to HALT
r10xFEFE0000Stack pointer (STACK_SEGMENT_END)
r70xFEFF0000Arguments pointer (PVM address)
r8args.lengthArguments length in bytes
r2–r6, r9–r120Available

Entry prologue differences from a normal function:

  1. No stack overflow check (main function starts with full stack)
  2. Allocates frame and stores SSA slots
  3. No callee-saved register saves (no caller to return to)
  4. Adjusts args_ptr: r7 = r7 - wasm_memory_base (convert PVM address to WASM address)
  5. Stores r7 and r8 to parameter slots

Entry return — unified packed i64 convention:

The entry function must return a single i64 value encoding a pointer and length:

  • Lower 32 bits = WASM pointer to result data
  • Upper 32 bits = result length in bytes
  • PVM output: r7 = (ret & 0xFFFFFFFF) + wasm_memory_base, r8 = r7 + (ret >> 32)

All entry functions end by jumping to EXIT_ADDRESS (0xFFFF0000).

Start Function

If a WASM start function exists, the entry function calls it before processing arguments. r7/r8 are saved to the stack, the start function is called (no arguments), then r7/r8 are restored.


SPI/JAM Program Format

The compiled output is a JAM file in the SPI (Standard Program Interface) format:

Offset  Size    Field
──────  ──────  ─────────────────────
0       3       ro_data_len (u24 LE)
3       3       rw_data_len (u24 LE)
6       2       heap_pages  (u16 LE)
8       3       stack_size  (u24 LE)
11      N       ro_data     (dispatch table)
11+N    M       rw_data     (globals + WASM memory initial data)
11+N+M  4       code_len    (u32 LE)
15+N+M  K       code        (PVM program blob)

heap_pages is computed from the WASM module’s initial_pages (not max_pages). It represents the number of 4KB PVM pages pre-allocated as zero-initialized writable memory at program start. Additional memory beyond this is allocated on demand via sbrk/memory.grow. Programs declaring (memory 0) get a minimum of 16 WASM pages (1MB) to accommodate AssemblyScript runtime memory accesses.

PVM Code Blob

Inside the code section, the PVM blob format is:

- jump_table_len  (varint u32)
- item_len        (u8, always 4)
- code_len        (varint u32)
- jump_table      (4 bytes per entry, code offsets)
- instructions    (PVM bytecode)
- mask            (bit-packed instruction start markers)

Entry Header

The first 10 bytes of code are the entry header:

[0–4]   Jump  <main_function_offset>        (5 bytes)
[5–9]   Jump  <secondary_entry_offset>      (5 bytes, or Trap + padding)

The secondary entry is for future use (e.g. is_authorized). If unused, it emits Trap followed by 4 Fallthrough instructions as padding.


Phi Node Handling

Phi nodes (SSA merge points) use a two-pass approach to avoid clobbering:

  1. Load pass: Load all incoming phi values into temp registers (t0, t1, t2, s0, s1)
  2. Store pass: Store all temps to their destination phi result slots

This supports up to 5 simultaneous phi values. The two-pass design prevents cycles where storing one phi value would overwrite a source needed by another phi.


Design Trade-offs

DecisionRationale
Stack-slot for every SSA valueCorrectness-first baseline; linear-scan register allocator (for loop-containing functions) assigns high-use values to available callee-saved regs (r9-r12 when not used for this function’s incoming parameters), and per-block register cache eliminates most remaining redundant loads
Spill area below SPFrame grows up from SP, spill area grows down — no overlap
Fixed-address overflow region (computed per-module)Avoids stack frame complexity for overflow params; reserved only when some signature needs it (see needs_param_overflow)
Jump-table indices as return addressesRequired by PVM’s JUMP_IND semantics
Entry function has no stack checkStarts with full stack, nothing to overflow into
Unsigned stack limit comparisonLoadImm64 avoids sign-extension bugs with large addresses
unsafe forbiddenWorkspace-level deny(unsafe_code) lint

References

  • crates/wasm-pvm/src/abi.rs — Register and frame constants
  • crates/wasm-pvm/src/memory_layout.rs — Memory address constants
  • crates/wasm-pvm/src/llvm_backend/emitter.rs — PvmEmitter and value management
  • crates/wasm-pvm/src/llvm_backend/calls.rs — Calling convention implementation
  • crates/wasm-pvm/src/llvm_backend/control_flow.rs — Prologue/epilogue/return
  • crates/wasm-pvm/src/spi.rs — JAM/SPI format encoder
  • Technical Reference — Technical reference and debugging journal
  • Gray Paper — JAM/PVM specification