ABI & Calling Conventions
Register assignments, calling convention, stack frame layout, memory layout, and the SPI/JAM program format used by the WASM-to-PVM recompiler.
The canonical source for constants lives in crates/wasm-pvm/src/abi.rs and crates/wasm-pvm/src/memory_layout.rs.
Register Assignments
PVM provides 13 general-purpose 64-bit registers (r0–r12). The compiler assigns them as follows:
| Register | Alias | Purpose | Saved by |
|---|---|---|---|
| r0 | ra | Return address (jump table index) | Callee |
| r1 | sp | Stack pointer (grows downward) | Callee |
| r2 | t0 | Temp: load operand 1 / immediates | Caller |
| r3 | t1 | Temp: load operand 2 | Caller |
| r4 | t2 | Temp: ALU result | Caller |
| r5 | s0 | Scratch | Caller |
| r6 | s1 | Scratch | Caller |
| r7 | a0 | Return value / SPI args_ptr | Caller |
| r8 | a1 | SPI args_len / second result | Caller |
| r9 | l0 | Local 0 / param 0 | Callee |
| r10 | l1 | Local 1 / param 1 | Callee |
| r11 | l2 | Local 2 / param 2 | Callee |
| r12 | l3 | Local 3 / param 3 | Callee |
Callee-saved (r0, r1, r9–r12): the callee must preserve these across calls. Caller-saved (r2–r8): the caller must assume these are clobbered by any call.
Stack Frame Layout
Every function allocates a stack frame. The stack grows downward (SP decreases).
Higher addresses
┌─────────────────────────┐
│ caller's frame ... │
old SP → ├─────────────────────────┤
│ Saved r0 (ra) +0 │ 8 bytes
│ Saved r9 (l0) +8 │ 8 bytes
│ Saved r10 (l1) +16 │ 8 bytes
│ Saved r11 (l2) +24 │ 8 bytes
│ Saved r12 (l3) +32 │ 8 bytes
├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ FRAME_HEADER_SIZE = 40
│ SSA value slot 0 +40 │ 8 bytes
│ SSA value slot 1 +48 │ 8 bytes
│ ... │ 8 bytes per SSA value
new SP → ├─────────────────────────┤
│ (operand spill area) │ SP - 0x100 .. SP
└─────────────────────────┘
Lower addresses
Frame size = FRAME_HEADER_SIZE (40) + num_ssa_values * 8
The operand spill area at SP + OPERAND_SPILL_BASE (i.e. SP - 0x100) is used for
temporary storage during phi-node copies and indirect calls. The frame grows upward
from SP (toward higher addresses), while the spill area is below SP, so the two
regions never overlap regardless of frame size. However, a callee’s frame allocation
must not reach into the caller’s spill area — this is protected by the stack overflow
check which ensures SP - frame_size >= stack_limit.
Stack-Slot Approach with Register Allocation
Every LLVM SSA value gets a dedicated 8-byte stack slot. The baseline instruction sequence is:
- Load operands from stack slots into temp registers (t0, t1)
- Execute ALU operation, result in t2
- Store t2 back to the result’s stack slot
A linear-scan register allocator (regalloc.rs) improves on this when a function
contains loop back-edges; loop-free functions skip allocation entirely. Candidate
intervals are built from use-def live-interval analysis and filtered by a minimum-use
threshold (MIN_USES_FOR_ALLOCATION, currently 3), rather than requiring per-value
“loop-spanning” as the eligibility rule. The allocator assigns eligible values to
available callee-saved registers (r9-r12 when not used for this function’s incoming
parameters). In non-leaf functions, r9+ needed for outgoing call arguments are reserved
from allocation. Call-site clobber handling/reloads are performed by the emitter after
calls, not by explicit call-site invalidation logic inside regalloc itself. Combined
with the register cache, this eliminates most redundant memory traffic.
Per-Block Register Cache (Store-Load Forwarding)
PvmEmitter maintains a per-basic-block register cache (slot_cache: HashMap<i32, u8>,
reg_to_slot: [Option<i32>; 13]) that tracks which stack slot values are currently live
in registers. This eliminates redundant LoadIndU64 instructions:
- Cache hit, same register: Skip entirely (0 instructions emitted)
- Cache hit, different register: Emit
AddImm64 dst, cached_reg, 0(register copy) - Cache miss: Emit normal
LoadIndU64, then record in cache
The cache is invalidated:
- When a register is overwritten (auto-detected via
Instruction::dest_reg()) - At block boundaries (
define_label()clears the entire cache) - After function calls (
clear_reg_cache()afterFallthroughreturn points) - After ecalli host calls (
clear_reg_cache()afterEcalli)
Impact: ~50% gas reduction, ~15-40% code size reduction across benchmarks.
Calling Convention
Parameter Passing
| Parameter | Location |
|---|---|
| 1st–4th | r9–r12 |
| 5th+ | PARAM_OVERFLOW_BASE (0x32000 + (i-4)*8) in global memory |
Return value: r7 (single i64).
Caller Sequence
1. Load arguments into r9–r12 (first 4)
2. Store overflow arguments to PARAM_OVERFLOW_BASE
3. LoadImm64 r0, <return_jump_table_index>
4. Jump <callee_code_offset>
── callee executes ──
5. (fallthrough) Store r7 to result slot if function returns a value
Callee Prologue
1. Stack overflow check (skipped for entry function):
LoadImm64 t1, stack_limit ; unsigned comparison!
AddImm64 t2, sp, -frame_size
BranchGeU t1, t2, continue
Trap ; stack overflow → panic
2. Allocate frame:
AddImm64 sp, sp, -frame_size
3. Save callee-saved registers:
StoreIndU64 [sp+0], r0
StoreIndU64 [sp+8], r9
StoreIndU64 [sp+16], r10
StoreIndU64 [sp+24], r11
StoreIndU64 [sp+32], r12
4. Copy parameters to SSA value slots:
- First 4 from r9–r12
- 5th+ loaded from PARAM_OVERFLOW_BASE
Callee Epilogue (return)
1. Load return value into r7 (if returning a value)
2. Restore callee-saved registers:
LoadIndU64 r9, [sp+8]
LoadIndU64 r10, [sp+16]
LoadIndU64 r11, [sp+24]
LoadIndU64 r12, [sp+32]
3. Restore return address:
LoadIndU64 r0, [sp+0]
4. Deallocate frame:
AddImm64 sp, sp, +frame_size
5. Return:
JumpInd r0, 0
Jump Table & Return Addresses
PVM’s JUMP_IND instruction uses a jump table — it is not a direct address jump:
JUMP_IND rA, offset
target_address = jumpTable[(rA + offset) / 2 - 1]
Return addresses stored in r0 are therefore jump-table indices, not code offsets:
r0 = (jump_table_index + 1) * 2
The jump table is laid out as:
[ return_addr_0, return_addr_1, ..., // for call return sites
func_0_entry, func_1_entry, ... ] // for indirect calls
Each entry is a 4-byte code offset (u32). Jump table entries for call_indirect
encode function entry points used by the dispatch table.
Indirect Calls (call_indirect)
A dispatch table at RO_DATA_BASE (0x10000) maps WASM table indices to
function entry points:
Dispatch table entry (8 bytes each):
[0–3] Jump address (u32, byte offset → jump table index)
[4–7] Type signature index (u32)
The indirect call sequence:
1. Compute dispatch_addr = RO_DATA_BASE + (table_index << 3)
2. Load type_idx from [dispatch_addr + 4]
3. Compare type_idx with expected_type_idx
4. Trap if mismatch (signature validation)
5. Load jump_addr from [dispatch_addr + 0]
6. LoadImmJumpInd jump_addr, r0, <return_jump_table_index>, 0
Import Calls
host_call_N(ecalli_index, r7, ..., r7+N-1) -> i64 → ecalli
A family of typed host call imports where N (0–6) indicates the number of data
arguments loaded into r7–r12. The first argument must be a compile-time constant
(the ecalli index). All variants return r7 as an i64.
| Import | Params | Registers set |
|---|---|---|
host_call_0 | (i64) | none |
host_call_1 | (i64 i64) | r7 |
host_call_2 | (i64 i64 i64) | r7-r8 |
host_call_3 | (i64 i64 i64 i64) | r7-r9 |
host_call_4 | (i64 i64 i64 i64 i64) | r7-r10 |
host_call_5 | (i64 i64 i64 i64 i64 i64) | r7-r11 |
host_call_6 | (i64 i64 i64 i64 i64 i64 i64) | r7-r12 |
Example — JIP-1 log call with 5 register args:
(import "env" "host_call_5" (func $host_call_5 (param i64 i64 i64 i64 i64 i64) (result i64)))
(import "env" "pvm_ptr" (func $pvm_ptr (param i64) (result i64)))
;; ecalli 100 = log; r7=level, r8=target_ptr, r9=target_len, r10=msg_ptr, r11=msg_len
(drop (call $host_call_5
(i64.const 100) ;; ecalli index
(i64.const 3) ;; r7: log level
(call $pvm_ptr (i64.const 0)) ;; r8: target PVM pointer
(i64.const 8) ;; r9: target length
(call $pvm_ptr (i64.const 8)) ;; r10: message PVM pointer
(i64.const 15))) ;; r11: message length
host_call_Nb — two-register output variants
Same as host_call_N but also captures r8 after the ecalli to a dedicated
stack slot (R8_CAPTURE_SLOT_OFFSET relative to SP). Use the companion import
host_call_r8() -> i64 (no arguments) to retrieve the captured value. The
host_call_r8 call must be in the same function as the preceding host_call_Nb.
All *b variants (host_call_0b through host_call_6b) are supported.
Example:
(import "env" "host_call_2b" (func $host_call_2b (param i64 i64 i64) (result i64)))
(import "env" "host_call_r8" (func $host_call_r8 (result i64)))
;; Call ecalli 10, passing r7=100 and r8=200.
;; Store r7 return value, then retrieve r8.
(local $r7 i64)
(local $r8 i64)
(local.set $r7 (call $host_call_2b (i64.const 10) (i64.const 100) (i64.const 200)))
(local.set $r8 (call $host_call_r8))
pvm_ptr(wasm_addr) -> pvm_addr
Converts a WASM-space address to a PVM-space address by zero-extending to 64 bits
and adding wasm_memory_base.
Other imports
The abort import emits Trap (unrecoverable error). All other unresolved
imports cause a compilation error — they must be resolved via --imports or
--adapter before compilation succeeds.
Memory Layout
PVM Address Space:
0x00000 - 0x0FFFF Reserved / guard (fault on access)
0x10000 - 0x1FFFF Read-only data (RO_DATA_BASE) — dispatch tables
0x20000 - 0x2FFFF Gap zone (unmapped, guard between RO and RW)
0x30000 - 0x31FFF Globals window (GLOBAL_MEMORY_BASE, 8KB cap; actual bytes used = globals_region_size(...))
0x32000 - 0x320FF Parameter overflow area (5th+ function arguments)
0x32100+ Spilled locals (per-function metadata, typically unused)
0x33000+ WASM linear memory (4KB-aligned, computed dynamically via `compute_wasm_memory_base`)
... (unmapped gap until stack)
0xFEFE0000 STACK_SEGMENT_END (initial SP)
0xFEFF0000 Arguments segment (input data, read-only)
0xFFFF0000 EXIT_ADDRESS (jump here → HALT)
Key formulas (see memory_layout.rs):
- Global address:
0x30000 + global_index * 4 - Memory size global:
0x30000 + num_globals * 4 - Spilled local:
0x32100 + func_idx * SPILLED_LOCALS_PER_FUNC + local_offset - WASM memory base:
align_up(max(SPILLED_LOCALS_BASE + num_funcs * SPILLED_LOCALS_PER_FUNC, GLOBAL_MEMORY_BASE + globals_region_size(num_globals, num_passive_segments)), 4KB)— the heap starts immediately after the globals/passive-length region, aligned to PVM page size (4KB). This is typically0x33000for programs with few globals. - Stack limit:
0xFEFE0000 - stack_size
RW data layout
SPI rw_data is defined as a contiguous dump of every byte from GLOBAL_MEMORY_BASE up to the last initialized byte of the WASM heap; the loader memcpys this region at 0x30000, so there is no sparse encoding or per-segment offsets inside the blob. That is why the zero stretch between the globals window and the first non-zero heap byte is encoded verbatim instead of being skipped.
build_rw_data() trims trailing zero bytes before SPI encoding. Heap pages are zero-initialized, so omitted trailing zeros are semantically equivalent.
Entry Function (SPI Convention)
The entry function is special — it follows SPI conventions rather than the normal calling convention.
Initial register state (set by the PVM runtime):
| Register | Value | Purpose |
|---|---|---|
| r0 | 0xFFFF0000 | EXIT address — jump here to HALT |
| r1 | 0xFEFE0000 | Stack pointer (STACK_SEGMENT_END) |
| r7 | 0xFEFF0000 | Arguments pointer (PVM address) |
| r8 | args.length | Arguments length in bytes |
| r2–r6, r9–r12 | 0 | Available |
Entry prologue differences from a normal function:
- No stack overflow check (main function starts with full stack)
- Allocates frame and stores SSA slots
- No callee-saved register saves (no caller to return to)
- Adjusts args_ptr:
r7 = r7 - wasm_memory_base(convert PVM address to WASM address) - Stores r7 and r8 to parameter slots
Entry return — unified packed i64 convention:
The entry function must return a single i64 value encoding a pointer and length:
- Lower 32 bits = WASM pointer to result data
- Upper 32 bits = result length in bytes
- PVM output:
r7 = (ret & 0xFFFFFFFF) + wasm_memory_base,r8 = r7 + (ret >> 32)
All entry functions end by jumping to EXIT_ADDRESS (0xFFFF0000).
Start Function
If a WASM start function exists, the entry function calls it before processing arguments. r7/r8 are saved to the stack, the start function is called (no arguments), then r7/r8 are restored.
SPI/JAM Program Format
The compiled output is a JAM file in the SPI (Standard Program Interface) format:
Offset Size Field
────── ────── ─────────────────────
0 3 ro_data_len (u24 LE)
3 3 rw_data_len (u24 LE)
6 2 heap_pages (u16 LE)
8 3 stack_size (u24 LE)
11 N ro_data (dispatch table)
11+N M rw_data (globals + WASM memory initial data)
11+N+M 4 code_len (u32 LE)
15+N+M K code (PVM program blob)
heap_pages is computed from the WASM module’s initial_pages (not max_pages).
It represents the number of 4KB PVM pages pre-allocated as zero-initialized writable memory
at program start. Additional memory beyond this is allocated on demand via sbrk/memory.grow.
Programs declaring (memory 0) get a minimum of 16 WASM pages (1MB) to accommodate
AssemblyScript runtime memory accesses.
PVM Code Blob
Inside the code section, the PVM blob format is:
- jump_table_len (varint u32)
- item_len (u8, always 4)
- code_len (varint u32)
- jump_table (4 bytes per entry, code offsets)
- instructions (PVM bytecode)
- mask (bit-packed instruction start markers)
Entry Header
The first 10 bytes of code are the entry header:
[0–4] Jump <main_function_offset> (5 bytes)
[5–9] Jump <secondary_entry_offset> (5 bytes, or Trap + padding)
The secondary entry is for future use (e.g. is_authorized). If unused, it emits
Trap followed by 4 Fallthrough instructions as padding.
Phi Node Handling
Phi nodes (SSA merge points) use a two-pass approach to avoid clobbering:
- Load pass: Load all incoming phi values into temp registers (t0, t1, t2, s0, s1)
- Store pass: Store all temps to their destination phi result slots
This supports up to 5 simultaneous phi values. The two-pass design prevents cycles where storing one phi value would overwrite a source needed by another phi.
Design Trade-offs
| Decision | Rationale |
|---|---|
| Stack-slot for every SSA value | Correctness-first baseline; linear-scan register allocator (for loop-containing functions) assigns high-use values to available callee-saved regs (r9-r12 when not used for this function’s incoming parameters), and per-block register cache eliminates most remaining redundant loads |
| Spill area below SP | Frame grows up from SP, spill area grows down — no overlap |
Global PARAM_OVERFLOW_BASE | Avoids stack frame complexity for overflow params |
| Jump-table indices as return addresses | Required by PVM’s JUMP_IND semantics |
| Entry function has no stack check | Starts with full stack, nothing to overflow into |
| Unsigned stack limit comparison | LoadImm64 avoids sign-extension bugs with large addresses |
unsafe forbidden | Workspace-level deny(unsafe_code) lint |
References
crates/wasm-pvm/src/abi.rs— Register and frame constantscrates/wasm-pvm/src/memory_layout.rs— Memory address constantscrates/wasm-pvm/src/llvm_backend/emitter.rs— PvmEmitter and value managementcrates/wasm-pvm/src/llvm_backend/calls.rs— Calling convention implementationcrates/wasm-pvm/src/llvm_backend/control_flow.rs— Prologue/epilogue/returncrates/wasm-pvm/src/spi.rs— JAM/SPI format encoder- Technical Reference — Technical reference and debugging journal
- Gray Paper — JAM/PVM specification