Introduction
pvm-decompiler is a decompiler for PVM (Polkadot Virtual Machine) bytecode. It takes compiled .pvm binary files and produces readable pseudo-code output.
What It Does
The tool performs several analysis steps on PVM binaries:
- Decode the binary (SPI or raw ProgramBlob format)
- Build a control flow graph (CFG) and detect function boundaries
- Analyze dataflow and variable liveness
- Lift register operations into higher-level expressions
- Recover control structures like
if/else,whileloops, andswitch/case - Emit readable pseudo-code with inferred variable types
Output Modes
The decompiler supports several output modes:
| Mode | Flag | Description |
|---|---|---|
| Pseudo-code | (default) | Structured pseudo-code with type annotations |
| Verbose | --verbose | Pseudo-code plus CFG, dataflow, and structural analysis details |
| Debug | --debug | Raw decoded instructions and all diagnostics |
| LLVM IR | --llvm | Low-level LLVM intermediate representation |
| C code | --decompile | Full LLVM pipeline producing C output |
| LLM refined | --refine | Pseudo-code with variable names improved by an LLM |
Supported Input Formats
- SPI-wrapped PVM binaries (most common)
- Raw ProgramBlob binaries
- Binaries with a metadata prefix (auto-stripped before decode)
How to Read This Documentation
- Start with Quick Start to build and run the tool
- Look at Examples to see real decompilation results
- Check Backend Comparison to understand the LLVM C output backend
- See CLI Reference for all available flags
Quick Start
Build
cargo build --release
The binary is at ./target/release/pvm-decompiler.
Basic Usage
Decompile a PVM binary to pseudo-code:
./target/release/pvm-decompiler examples/compiled/fibonacci.pvm
Output:
fn main(r1: u64, r7: u64, r8: u64) {
let ptr_0_80
let ptr_0_88
let ptr_0_96
let ptr_0_56 = u32[r7]
ptr_0_80 = 0
ptr_0_88 = 1
ptr_0_96 = 0
while (ptr_0_80 <u ptr_0_56) {
ptr_0_80 = ptr_0_80 + 1
ptr_0_88 = ptr_0_96 + ptr_0_88
ptr_0_96 = ptr_0_88
}
u32[0x20000] = ptr_0_96
halt()
}
Verbose Mode
Use --verbose to see the analysis details (CFG, dataflow, structural analysis) together with the pseudo-code:
./target/release/pvm-decompiler --verbose examples/compiled/fibonacci.pvm
This prints information about detected functions, def-use chains, lifted variables, detected loops, and the final pseudo-code.
Debug Mode
Use --debug to see the raw decoded PVM instructions:
./target/release/pvm-decompiler --debug examples/compiled/fibonacci.pvm
This shows things like container format, jump tables, and each individual instruction with its program counter.
LLM Refinement
If you set the OPENROUTER_API_KEY environment variable, you can ask an LLM to improve the variable names in the output:
export OPENROUTER_API_KEY="your-key-here"
./target/release/pvm-decompiler --refine examples/compiled/fibonacci.pvm
The LLM renames variables like ptr_0_80 to more meaningful names like loop_counter based on how they are used.
LLVM C Output
For a full decompilation to C code through the LLVM pipeline:
./target/release/pvm-decompiler --decompile --backend=builtin examples/compiled/fibonacci.pvm
See Backend Comparison for details about available backends.
Examples
This section walks through several real PVM programs. For each example we show:
- The original source code (WAT or AssemblyScript)
- Basic metadata about the compiled binary
- The decompiled pseudo-code output
- Where available, the LLM-refined output with better variable names
The examples go from simple to more complex:
- Branch Table – a small WAT program with
switch/casestyle branching - Fibonacci (WAT) – classic fibonacci in WebAssembly text format
- Fibonacci (AssemblyScript) – same algorithm compiled from AssemblyScript, shows how a higher-level language compiles differently
- Control Flow – a larger example with
if/else,while, nestedforloops, andbreak - JAM Fuzzy Service – a real-world Rust JAM service (~142 KB, 63 functions, no source available)
- Ananas – a real-world AssemblyScript JAM service (~442 KB, 189 functions, source on GitHub)
The next examples show more complex patterns:
- Functions (AssemblyScript) – multiple helper functions (add, factorial, square-in-loop), all inlined by the compiler
- Linked List (AssemblyScript) – heap-allocated linked list with recursive traversal
- Game of Life (AssemblyScript) – Conway’s Game of Life on a 16x16 grid, aggressive inlining
- Host Call Log (WAT) – minimal host call example using
ecallifor logging - Fibonacci (as-lan) – fibonacci from a full AssemblyScript framework (~38 KB, 18 functions)
Each example can be reproduced by running the decompiler on files from the examples/compiled/ directory.
Branch Table
A small WAT program that uses br_table for indexed branching. The decompiler recovers this as a switch/case statement.
Source
File: examples/sources/br-table.wat
(module
(memory 1)
(func (export "main") (param $args_ptr i32) (param $args_len i32) (result i64)
(local $index i32)
(local $result i32)
(local.set $index (i32.load (local.get $args_ptr)))
(block $case3
(block $case2
(block $case1
(block $case0
(br_table $case0 $case1 $case2 $case3 (local.get $index))
)
(local.set $result (i32.const 100))
(br $case3)
)
(local.set $result (i32.const 200))
(br $case3)
)
(local.set $result (i32.const 300))
(br $case3)
)
(if (i32.eq (local.get $result) (i32.const 0))
(then
(local.set $result (i32.const 999))
)
)
(i32.store (i32.const 0) (local.get $result))
(i64.const 17179869184) ;; ptr=0, len=4
)
)
The program reads an index from memory, branches to one of four cases (setting result to 100, 200, 300, or 0), then falls back to 999 if the result is still zero. Finally it writes the result to memory. The return value is a packed i64: lower 32 bits = result pointer, upper 32 bits = result length.
Compiled Metadata
| Field | Value |
|---|---|
| File | examples/compiled/br-table.pvm |
| Size | 206 bytes |
| Format | SPI |
| Functions | 1 |
Decompiled Output
./target/release/pvm-decompiler examples/compiled/br-table.pvm
fn main(r1: u64, r7: u64) {
let ptr_0_72
// @0006
let var_1 = u32[r7]
switch (var_1) {
case 0:
// @007b
ptr_0_72 = 100
case 1:
// @006f
ptr_0_72 = 200
case 2:
// @0063
ptr_0_72 = 300
default:
// @0032
ptr_0_72 = 999
break;
}
// @003a
u32[0x3000] = ptr_0_72
halt()
}
What to notice:
- The
br_tableis recovered as a cleanswitchstatement with four cases. - The variable
ptr_0_72holds the intermediate result from each case. - The fallback check (
if result == 0 then 999) has been folded into the default case by the compiler. - Memory write
u32[0x3000]corresponds to thei32.storeat offset 0 plus the PVM memory base address.
Fibonacci (WAT)
A fibonacci implementation in WebAssembly text format. It reads n from memory, computes fib(n), and writes the result back.
Source
File: examples/sources/fibonacci.wat
(module
(memory 1)
(func (export "main") (param $args_ptr i32) (param $args_len i32) (result i64)
(local $n i32)
(local $i i32)
;; Read n from args
(local.set $n (i32.load (local.get $args_ptr)))
;; Initialize: a=0, b=1, i=0
(local.set $args_ptr (i32.const 0)) ;; reuse as $a
(local.set $args_len (i32.const 1)) ;; reuse as $b
(local.set $i (i32.const 0))
(block $break
(loop $continue
(br_if $break (i32.ge_u (local.get $i) (local.get $n)))
;; a, b = b, a+b
(local.get $args_len)
(i32.add (local.get $args_ptr) (local.get $args_len))
(local.set $args_len)
(local.set $args_ptr)
(local.set $i (i32.add (local.get $i) (i32.const 1)))
(br $continue)
)
)
(i32.store (i32.const 0) (local.get $args_ptr))
(i64.const 17179869184) ;; ptr=0, len=4
)
)
The source reuses $args_ptr and $args_len as the fibonacci accumulators a and b after reading the input. This is a common trick in hand-written WAT to avoid extra locals. The return value is a packed i64: lower 32 bits = result pointer, upper 32 bits = result length.
Compiled Metadata
| Field | Value |
|---|---|
| File | examples/compiled/fibonacci.pvm |
| Size | 266 bytes |
| Format | SPI |
| Functions | 1 |
Decompiled Output
./target/release/pvm-decompiler examples/compiled/fibonacci.pvm
fn main(r1: u64, r7: u64) {
let ptr_0_72
let ptr_0_80
let ptr_0_88
// @0000
// @0006
let ptr_0_56 = u32[r7]
ptr_0_72 = 0
ptr_0_80 = 1
ptr_0_88 = 0
while (ptr_0_72 <u ptr_0_56) {
// @0075
ptr_0_72 = ptr_0_72 + 1
ptr_0_80 = ptr_0_88 + ptr_0_80 << 32 >>u 32
ptr_0_88 = ptr_0_80
}
// @0033
u32[0x3000] = ptr_0_88
halt()
}
What to notice:
- The
loop/blockpair from WAT is recovered as awhileloop. ptr_0_56holds the inputn, read from memory atr7.ptr_0_72is the loop counteri.ptr_0_80andptr_0_88correspond to the fibonacci accumulatorsbanda.- The swap logic
a, b = b, a+bis visible in the loop body.
Fibonacci (AssemblyScript)
The same fibonacci algorithm, but written in AssemblyScript. Comparing this with the WAT version shows how a higher-level source language produces different binary structure.
Source
File: examples/sources/as-fibonacci.ts
export let result_ptr: i32 = 0;
export let result_len: i32 = 0;
export function main(args_ptr: i32, args_len: i32): void {
const RESULT_HEAP = heap.alloc(256);
let n = load<i32>(args_ptr);
let a: i32 = 0;
let b: i32 = 1;
while (n > 0) {
b = a + b;
a = b - a;
n = n - 1;
}
store<i32>(RESULT_HEAP, a);
result_ptr = RESULT_HEAP as i32;
result_len = 4;
}
Compared to the WAT version, the AssemblyScript source uses heap.alloc() for the output buffer and exports result_ptr/result_len as globals. The compiler generates a two-function binary (entry wrapper + actual logic).
Compiled Metadata
| Field | Value |
|---|---|
| File | examples/compiled/as-fibonacci.pvm |
| Size | 1338 bytes |
| Format | SPI |
| Functions | 2 |
| Instructions | 334 |
| Jump table entries | 3 |
| Code size | 1118 bytes |
The binary is about 4x larger than the WAT version. The AssemblyScript compiler adds runtime support code, a heap allocator call, and a separate entry function.
Decompiled Output
./target/release/pvm-decompiler examples/compiled/as-fibonacci.pvm
fn main(r1: u64, r7: u64, r8: u64, r9: u64, r10: u64, r11: u64, r12: u64) {
func_1(r1 - 16)
}
fn func_1(r1: u64) {
let ptr_0_40
let ptr_0_520
let ptr_0_528
let ptr_0_536
let ptr_0_88
ptr_0_40 = u64[r1] - 0x50000
ptr_0_88 = heap_alloc(272)
ptr_0_520 = 0
ptr_0_528 = 1
ptr_0_536 = *ptr_0_40
while (ptr_0_536 >s 0) {
let var_136 = ptr_0_528 + ptr_0_520
ptr_0_520 = var_136 - ptr_0_520
ptr_0_528 = var_136
ptr_0_536 = ptr_0_536 - 1
}
*ptr_0_88 = ptr_0_520
RESULT_PTR = ptr_0_88
RESULT_LEN = 4
halt()
}
What to notice:
- The decompiler detects two functions: a thin
mainwrapper and the actualfunc_1. heap_alloc(272)corresponds to the sourceheap.alloc(256)– the AssemblyScript runtime adds a small header to each allocation.- The fibonacci loop is clean:
ptr_0_520isa,ptr_0_528isb, andptr_0_536is the countdownn. RESULT_PTRandRESULT_LENare recognized as global exports.- The
>soperator means “signed greater than”, matching the sourcen > 0.
Refined Output (LLM)
./target/release/pvm-decompiler --refine examples/compiled/as-fibonacci.pvm
fn main(r1: u64, r7: u64, r8: u64, r9: u64, r10: u64, r11: u64, r12: u64) {
func_1(r1 - 16)
}
fn func_1(r1: u64) {
let input_data_ptr
let fib_next
let fib_current
let loop_counter
let output_buffer
input_data_ptr = u64[r1] - 0x50000
output_buffer = heap_alloc(272)
fib_current = 0
fib_next = 1
loop_counter = *input_data_ptr
while (loop_counter >s 0) {
let next_val = fib_next + fib_current
fib_current = next_val - fib_current
fib_next = next_val
loop_counter = loop_counter - 1
}
*output_buffer = fib_current
RESULT_PTR = output_buffer
RESULT_LEN = 4
halt()
}
The LLM correctly identifies the fibonacci pattern and gives meaningful names: fib_current, fib_next, loop_counter, and output_buffer.
Comparison with WAT Version
| Aspect | WAT | AssemblyScript |
|---|---|---|
| Binary size | ~335 bytes | 1338 bytes |
| Functions | 1 | 2 |
| Instructions | ~70 | 334 |
| Memory model | Direct store to address 0 | heap_alloc + globals |
| Loop style | i < n (unsigned) | n > 0 (signed countdown) |
The WAT version is smaller because it is hand-written and avoids runtime overhead. The AssemblyScript version includes compiler-generated boilerplate but the core algorithm is still clearly visible in the decompiled output.
Control Flow
A larger AssemblyScript example that exercises multiple control flow patterns: if/else, while, nested for loops, and break.
Source
File: examples/sources/as-tests-control-flow.ts
let RESULT_HEAP: usize = 0;
export let result_ptr: i32 = 0;
export let result_len: i32 = 0;
function writeResult(val: i32): void {
store<i32>(RESULT_HEAP, val);
result_ptr = RESULT_HEAP as i32;
result_len = 4;
}
export function main(args_ptr: i32, args_len: i32): void {
RESULT_HEAP = heap.alloc(256);
const input = load<i32>(args_ptr);
let result = 0;
// If/Else
if (input > 10) {
result = 1;
} else {
result = 2;
}
// While loop
let i = 0;
while (i < input) {
result += 1;
i++;
}
// Nested loop with break
for (let j = 0; j < 5; j++) {
for (let k = 0; k < 5; k++) {
if (k > 2) break;
result++;
}
}
writeResult(result);
}
This program does three things in sequence:
- Sets
resultto 1 or 2 depending on whetherinput > 10 - Adds
inputtoresultvia a while loop - Adds to
resultin a 5x5 nested loop, but the inner loop breaks whenk > 2(so effectively 5x3 = 15 iterations)
Compiled Metadata
| Field | Value |
|---|---|
| File | examples/compiled/as-tests-control-flow.pvm |
| Format | SPI |
| Functions | 2 |
Decompiled Output
./target/release/pvm-decompiler examples/compiled/as-tests-control-flow.pvm
fn main(r1: u64, r7: u64, r8: u64, r9: u64, r10: u64, r11: u64, r12: u64) {
func_1(r1 - 16)
}
fn func_1(r1: u64) {
let ptr_0_40
let ptr_0_512
let ptr_0_568
let ptr_0_576
let ptr_0_680
let ptr_0_688
let ptr_0_760
let ptr_0_768
let ptr_0_88
ptr_0_40 = u64[r1] - 0x50000
ptr_0_88 = heap_alloc(272)
RESULT_PTR = ptr_0_88
let ptr_0_464 = *ptr_0_40
ptr_0_512 = 2
if (*ptr_0_40 <=s 10) {
ptr_0_568 = 0
ptr_0_576 = ptr_0_512
goto block_0376;
} else {
}
ptr_0_512 = 1
ptr_0_568 = 0
ptr_0_576 = ptr_0_512
block_0376:
while (ptr_0_568 <s ptr_0_464) {
ptr_0_568 = ptr_0_568 + 1
ptr_0_576 = ptr_0_576 + 1
}
ptr_0_680 = 0
ptr_0_688 = ptr_0_576
while (ptr_0_680 <s 5) {
ptr_0_760 = 0
ptr_0_768 = ptr_0_688
while (ptr_0_760 <s 5 & ptr_0_760 <=s 2) {
ptr_0_760 = ptr_0_760 + 1
ptr_0_768 = ptr_0_768 + 1
}
ptr_0_680 = ptr_0_680 + 1
ptr_0_688 = ptr_0_768
}
u32[RESULT_PTR + 0x50000] = ptr_0_688
RESULT_LEN = 4
HEAP_PTR = 4
halt()
}
What to notice:
-
If/else recovery: The
if (*ptr_0_40 <=s 10)branch corresponds to the sourceif (input > 10)(the condition is inverted because the compiler swapped the true/false branches).ptr_0_512starts as 2 (the else case) and gets overwritten to 1 if the condition falls through. -
While loop: The
while (ptr_0_568 <s ptr_0_464)loop is a direct match towhile (i < input). The variableptr_0_576accumulates the result. -
Nested loops with break: The outer
while (ptr_0_680 <s 5)is thefor jloop. The innerwhile (ptr_0_760 <s 5 & ptr_0_760 <=s 2)combines the loop conditionk < 5with the break conditionk > 2into a single compound condition. This is how the decompiler represents early exits from loops. -
Inlined function: The
writeResulthelper is inlined by the compiler, so it appears as direct assignments toRESULT_PTR,RESULT_LEN, and a memory store at the end.
Reading Tips
When analyzing decompiled PVM output, keep these patterns in mind:
| Pattern in output | Meaning |
|---|---|
u32[addr] or u64[addr] | Memory load/store at the given address |
*ptr | Pointer dereference (load from computed address) |
>s, <s, <=s | Signed comparison operators |
<u, >=u | Unsigned comparison operators |
heap_alloc(n) | Runtime heap allocation of n bytes |
RESULT_PTR, RESULT_LEN | Recognized global exports |
halt() | Program termination (ecalli) |
goto block_XXXX | Jump to a labeled block (unstructured control flow) |
JAM Fuzzy Service
This is a real-world JAM service binary compiled in Rust. No source code is available for this one – it is included as a stress test for the decompiler on a production-size program.
Compiled Metadata
| Field | Value |
|---|---|
| File | examples/compiled/jam-fuzzy-service.pvm |
| Size | 145,725 bytes (~142 KB) |
| Format | SPI |
| Functions | 63 |
| Output lines | ~10,900 |
| Jump table entries | 962 |
This is significantly larger than the toy examples. The binary contains 63 detected functions and almost a thousand jump table entries, which means the original code uses heavy branching – typical for a service that handles many message types.
Decompiled Output (excerpt)
./target/release/pvm-decompiler examples/compiled/jam-fuzzy-service.pvm
The full output is about 10,900 lines. Here is the main function signature and a representative fragment showing nested conditionals with host calls:
fn main(r0: u64, r1: u64, r2: u64, r3: u64, r4: u64, r5: u64, r6: u64, r7: u64, r8: u64) {
let cond_128: bool
let ptr_0: ptr
let ptr_1073: ptr
let ptr_1073_0
...
A deeper fragment showing service logic:
if (0x4F87 >>u ptr_675_0 & 1 != 0) {
if (fetch() >=u var_16944) {
if (var_16944 != -1) {
u64[ptr_1071 + 72] = 0
u64[ptr_1071 + 48] = 1
u64[ptr_1071 + 56] = 8
u64[ptr_1071 + 64] = 0
r8 = 0x17F10
r7 = ptr_1071 + 40
goto block_12850;
} else {
ptr_1071_32->field_0 = -0x8000000000000000
}
} else {
if (var_16944 <s 0) {
r7 = 0x17D28
goto block_12480;
} else {
if (var_16944 == 0) {
let var_16960 = 0
u64[ptr_1071 + 0] = 1
goto block_155d7;
} else {
...
}
}
}
}
What to notice:
- The decompiler recovers deeply nested
if/elsetrees from flat branch chains. fetch()is a recognized PVM host call (ecalli).- Field access patterns like
ptr_1071_32->field_0show the decompiler trying to infer struct-like memory layouts. - Many
gototargets remain because the control flow is too complex for full structuring. This is expected for large real-world binaries. - The 63 detected functions give a rough sense of the original module structure.
Detected Functions
The decompiler identifies 63 functions. The first few:
fn main(r0: u64, r1: u64, r2: u64, r3: u64, r4: u64, r5: u64, r6: u64, r7: u64, r8: u64)
fn func_0(r1: u64, r5: u64)
fn func_1(r0: u64, r1: u64, r5: u64, r6: u64, r7: u64, r8: u64)
fn func_2(r1: u64, r5: u64)
fn func_3(r1: u64, r5: u64, r6: u64)
fn func_4(r7: u64, r8: u64)
fn func_5(r7: u64, r8: u64)
fn func_6(r0: u64, r1: u64, r5: u64, r6: u64, r7: u64, r8: u64)
...
The varying function signatures (different register sets) reflect the Rust compiler’s calling conventions at the PVM level.
Ananas
Ananas is a JAM service written in AssemblyScript. The source code is available at github.com/tomusdrw/anan-as. It is the largest example in this repository and is useful for testing the decompiler on a complex, real-world AssemblyScript binary.
Compiled Metadata
| Field | Value |
|---|---|
| File | examples/compiled/ananas.pvm |
| Size | 452,760 bytes (~442 KB) |
| Format | SPI |
| Functions | 189 |
| Output lines | ~11,300 |
| Jump table entries | 1,066 |
This is the largest binary in the examples directory – about 3x the size of the JAM fuzzy service. The AssemblyScript compiler generates 189 functions, which includes runtime support (garbage collector, memory allocator, string handling) in addition to the actual service logic.
Decompiled Output (excerpt)
./target/release/pvm-decompiler examples/compiled/ananas.pvm
The entry point and initialization:
fn main(r1: u64, r7: u64, r8: u64, r9: u64, r10: u64, r11: u64, r12: u64) {
let ptr_0: ptr
let ptr_0_104
let ptr_0_208
let ptr_0_240
let ptr_0_248
let ptr_0_320
let ptr_0_40
...
if (0xFEFD0000 >=u r1 - 16 - 256) {
ptr_1088 = ptr_992 - 256
}
if (0xFEFD0000 >=u ptr_1088 - 3760) {
RESULT_PTR = 0x4DFC
var_883 = 1856
var_884 = 0
var_885 = 0
goto block_399c;
}
return
A fragment showing the memory allocator logic (typical AssemblyScript runtime):
var_167 = u32[52]
var_168 = ptr_0_528 + var_167
if (var_168 <u 1024) {
r4 = -1
} else {
u32[52] = var_168
let ptr_6 = sbrk(16 << var_167 - var_168)
goto block_3897;
}
A fragment showing bitwise operations for data processing:
let var_97 = 32 >>u (32 << (0xFFFFFFF0 & 32 >>u (32 << 15 + ...)))
ptr_0_320 = var_97
if (var_69 <u var_97 <u 0) {
goto block_38ea;
}
What to notice:
- The stack guard check
0xFEFD0000 >=u r1 - 16 - 256at the top is inserted by the AssemblyScript compiler to detect stack overflow. sbrk()calls are recognized as the PVM memory growth host function. The allocator pattern (check available space, grow if needed) is clearly visible.RESULT_PTRis recognized as the standard JAM service output global.- The 189 functions include many that are part of the AssemblyScript runtime, not user code. Functions like memory copy, string operations, and GC routines make up a significant portion of the output.
Detected Functions
A selection of the 189 detected functions:
fn main(r1: u64, r7: u64, r8: u64, r9: u64, r10: u64, r11: u64, r12: u64)
fn func_1(r0: u64, r1: u64, r9: u64, r10: u64, r11: u64, r12: u64)
fn func_2(r0: u64, r1: u64, r9: u64, r10: u64, r11: u64, r12: u64)
fn func_3(r0: u64, r1: u64, r9: u64, r10: u64, r11: u64, r12: u64)
fn func_4(r1: u64, r7: u64)
fn func_5(r0: u64, r1: u64, r9: u64, r10: u64, r11: u64, r12: u64)
...
fn func_21(r1: u64)
fn func_23(r1: u64, r7: u64)
...
Most functions share the signature (r0, r1, r9, r10, r11, r12) which reflects the AssemblyScript compiler’s standard calling convention for PVM targets. Functions with fewer parameters (like func_4(r1, r7) and func_21(r1)) are likely utility or helper functions.
Comparison with JAM Fuzzy Service
| Aspect | JAM Fuzzy Service | Ananas |
|---|---|---|
| Source language | Rust | AssemblyScript |
| Binary size | 142 KB | 442 KB |
| Functions | 63 | 189 |
| Output lines | ~10,900 | ~11,300 |
| Jump table entries | 962 | 1,066 |
| Runtime overhead | Minimal | Large (AS runtime included) |
Despite being 3x larger in binary size, the ananas output is only slightly longer than the JAM fuzzy service. This is because much of the binary size comes from data sections and runtime code that decompiles into repetitive patterns. The Rust binary is more compact but its logic is denser.
Functions (AssemblyScript)
An AssemblyScript program with multiple helper functions: a three-argument add, a recursive factorial, and a square function called in a loop.
Source
File: examples/sources/as-tests-functions.ts
// Memory addresses
let RESULT_HEAP: usize = 0;
function writeResult(val: i32): i64 {
store<i32>(RESULT_HEAP, val);
return (RESULT_HEAP as i64) | ((4 as i64) << 32);
}
// Function with multiple args
function add3(a: i32, b: i32, c: i32): i32 {
return a + b + c;
}
// Recursive function
function factorial(n: i32): i32 {
if (n <= 1) return 1;
return n * factorial(n - 1);
}
// Function calls in loop
function square(n: i32): i32 {
return n * n;
}
export function main(args_ptr: i32, args_len: i32): i64 {
RESULT_HEAP = heap.alloc(256);
const n = load<i32>(args_ptr); // Input 5
let res = add3(n, 2, 3); // 5 + 2 + 3 = 10
res += factorial(n); // 10 + 120 = 130
let sumSquares = 0;
for (let i = 0; i < 3; i++) {
sumSquares += square(i); // 0 + 1 + 4 = 5
}
res += sumSquares; // 130 + 5 = 135
return writeResult(res);
}
The program computes add3(5, 2, 3) + factorial(5) + (0^2 + 1^2 + 2^2) = 10 + 120 + 5 = 135.
Compiled Metadata
| Field | Value |
|---|---|
| File | examples/compiled/as-tests-functions.pvm |
| Size | 986 bytes |
| Format | SPI |
| Functions | 3 |
Decompiled Output
./target/release/pvm-decompiler examples/compiled/as-tests-functions.pvm
fn func_2(r1: u64, r7: u64) {
// @0139
u64[r1 + 224] = r7
u64[r1 + 264] = 0
u64[r1 + 272] = 0
while (u64[r1 + 272] <s 3) {
// @01b2
let var_8 = u64[r1 + 264] + u64[r1 + 272] * u64[r1 + 272]
u64[r1 + 296] = var_8
u64[r1 + 264] = var_8
u64[r1 + 272] = u64[r1 + 272] + 1
}
// @01e2
let var_16 = u64[r1 + 224]
u64[r1 + 328] = var_16
let var_20 = u64[r1 + 216] + 5 + var_16
u64[r1 + 336] = var_20
let var_21 = RESULT_PTR
u64[r1 + 344] = var_21
let var_28 = var_20 + u64[r1 + 264] << 32 >>u 32
u64[r1 + 360] = var_28
u32[var_21 + 0x33000] = var_28
halt()
}
(Showing func_2 which contains the interesting computation; main and func_1 handle entry and heap allocation boilerplate.)
What to notice:
- Inlined helpers: The
add3,factorial, andsquarefunctions are inlined by the AssemblyScript compiler. The decompiler sees only the resulting combined computation. - Square-in-loop: The
while (u64[r1 + 272] <s 3)loop corresponds to thefor (let i = 0; i < 3; i++)loop callingsquare(i). The expressionu64[r1 + 272] * u64[r1 + 272]is the inlinedi * i. - Stack-frame layout: Variables are stored at stack offsets (
r1 + 224,r1 + 264, etc.) rather than in registers, reflecting the AssemblyScript compiler’s frame-based calling convention. - Result encoding: The final
u32[var_21 + 0x33000] = var_28writes the result to the heap, followed byhalt().
Linked List (AssemblyScript)
An AssemblyScript program that creates a three-node linked list and sums its values recursively.
Source
File: examples/sources/as-tests-linked-list.ts
// Memory addresses
let RESULT_HEAP: usize = 0;
let NODE_HEAP: usize = 0;
function writeResult(val: i32): i64 {
store<i32>(RESULT_HEAP, val);
return (RESULT_HEAP as i64) | ((4 as i64) << 32);
}
// Node structure: [value: i32, next: i32] (8 bytes)
function createNode(ptr: i32, val: i32, next: i32): void {
store<i32>(ptr, val);
store<i32>(ptr + 4, next);
}
function sumList(head: i32): i32 {
if (head == 0) return 0;
const val = load<i32>(head);
const next = load<i32>(head + 4);
// Recursive sum
return val + sumList(next);
}
export function main(args_ptr: i32, args_len: i32): i64 {
RESULT_HEAP = heap.alloc(256);
NODE_HEAP = heap.alloc(32); // 3 nodes * 8 bytes each = 24 bytes
// Create list: 10 -> 20 -> 30 -> null
createNode(NODE_HEAP, 10, NODE_HEAP + 8);
createNode(NODE_HEAP + 8, 20, NODE_HEAP + 16);
createNode(NODE_HEAP + 16, 30, 0);
const sum = sumList(NODE_HEAP); // 60
return writeResult(sum);
}
The program builds a linked list 10 -> 20 -> 30 -> null, then recursively sums the values to produce 60.
Compiled Metadata
| Field | Value |
|---|---|
| File | examples/compiled/as-tests-linked-list.pvm |
| Size | 1945 bytes |
| Format | SPI |
| Functions | 5 |
Decompiled Output
./target/release/pvm-decompiler examples/compiled/as-tests-linked-list.pvm
The output is large (501 lines) due to heap allocation boilerplate. Here are the most interesting fragments:
fn func_2(r7: u64) {
// @035f
u32[RESULT_PTR + 0x33000] = r7
halt()
}
func_2 is the writeResult helper – it writes the result to the heap and halts.
fn func_4(r0: u64, r1: u64, r9: u64, r10: u64, r11: u64) {
if (0xFEFD0000 >=u r1 - 72) {
// @056e
u32[r9 + 0x33000] = r10
u32[r9 + 0x33004] = r11
call_indirect(r0)
}
}
func_4 is the createNode helper – it stores value and next at adjacent 4-byte offsets, matching the [value: i32, next: i32] node layout.
What to notice:
- Pointer arithmetic: The node structure is visible as two consecutive
u32stores at offsets+0x33000and+0x33004(4 bytes apart). - Recursive traversal: The
sumListfunction compiles intofunc_3, which usescall_indirect(r0)to call back into itself – the recursivesumList(next)call. - Two heap allocations:
func_1(the main logic) performs twoheap_alloccalls, matchingheap.alloc(256)andheap.alloc(32)from the source. - Heap boilerplate: Much of the output is the
sbrk-based heap allocator pattern. The design principle of this decompiler favors showing high-level intent, but the allocator code is not yet collapsed for this example.
Game of Life (AssemblyScript)
A Conway’s Game of Life implementation on a 16x16 toroidal grid. It seeds glider, blinker, and toad patterns, then steps the simulation.
Source
File: examples/sources/as-life.ts
const WIDTH: i32 = 16;
const HEIGHT: i32 = 16;
const CELL_COUNT: i32 = WIDTH * HEIGHT;
let BUF_A: u32 = 0;
let BUF_B: u32 = 0;
let OUTPUT_BASE: u32 = 0;
@inline
function idx(x: i32, y: i32): u32 {
return (y * WIDTH + x) as u32;
}
@inline
function get(base: u32, x: i32, y: i32): u32 {
return load<u8>(base + idx(x, y)) as u32;
}
@inline
function set(base: u32, x: i32, y: i32, v: u32): void {
store<u8>(base + idx(x, y), v as u8);
}
function step_once(src: u32, dst: u32): void {
for (let y = 0; y < HEIGHT; ++y) {
for (let x = 0; x < WIDTH; ++x) {
// count 8 neighbors with toroidal wrapping
// apply B3/S23 rule
}
}
}
export function main(args_ptr: i32, args_len: i32): i64 {
const base = heap.alloc(CELL_COUNT * 2 + 8 + CELL_COUNT) as u32;
// ...seed, step, encode result...
}
(Source abbreviated for readability; see examples/sources/as-life.ts for the full listing.)
Compiled Metadata
| Field | Value |
|---|---|
| File | examples/compiled/as-life.pvm |
| Size | 2298 bytes |
| Format | SPI |
| Functions | 2 |
Decompiled Output
./target/release/pvm-decompiler examples/compiled/as-life.pvm
The output is 215 lines. Key fragments from func_1:
while (ptr_0_296 <s 256) {
// @01d7
u8[var_96 + 0x33000] = 0
ptr_0_296 = ptr_0_296 + 1
}
This is the clear() function inlined – it zeroes out 256 bytes (the 16x16 grid).
u8[r2 + 0x33000] = 1
u8[r2 + 0x33000] = 1
// ... (14 stores total)
These are the seed_world() calls inlined – each set(base, x, y, 1) becomes a direct byte store. The glider, blinker, and toad patterns total 14 alive cells.
while (ptr_0_688 <s 256) {
u8[HEAP_PTR + 8 + ptr_0_688 << 32 >>u 32 + 0x33000] =
u8[ptr_0_688 + ptr_0_608 << 32 >>u 32 + 0x33000]
ptr_0_688 = ptr_0_688 + 1
}
This is encode_result() – copying the cell buffer into the output area (skipping the 8-byte width/height header).
What to notice:
- Aggressive inlining: All helper functions (
idx,get,set,clear,seed_world,encode_result) are@inlineor inlined by the compiler, producing a single largefunc_1. - Constant-folded seeds: The seed pattern stores are fully unrolled – no loops, just 14 direct
u8stores. - Loop structure: The
while (... <s 256)loops correspond to iterating overCELL_COUNT(16 * 16 = 256) cells. - Double buffering: The simulation swaps between
BUF_AandBUF_BusingRESULT_LENandRESULT_PTRas buffer base pointers.
Host Call Log (WAT)
A minimal WAT program that demonstrates PVM host calls. It invokes ecalli 100 (the log host call) to print “Hello from PVM!” and returns 42.
Source
File: examples/sources/host-call-log.wat
(module
(import "env" "pvm_ptr" (func $pvm_ptr (param i64) (result i64)))
(import "env" "host_call_5" (func $host_call_5 (param i64 i64 i64 i64 i64 i64) (result i64)))
(memory (export "memory") 1)
;; "test-log" at offset 0 (8 bytes)
(data (i32.const 0) "test-log")
;; "Hello from PVM!" at offset 8 (15 bytes)
(data (i32.const 8) "Hello from PVM!")
(func (export "main") (param $args_ptr i32) (param $args_len i32) (result i64)
;; ecalli 100 = log host call
;; r7 = level (3 = INFO)
;; r8 = target_ptr (PVM address of "test-log")
;; r9 = target_len (8)
;; r10 = msg_ptr (PVM address of "Hello from PVM!")
;; r11 = msg_len (15)
(drop (call $host_call_5
(i64.const 100)
(i64.const 3)
(call $pvm_ptr (i64.const 0))
(i64.const 8)
(call $pvm_ptr (i64.const 8))
(i64.const 15)))
;; Return result: store 42 at offset 24, return (ptr=24, len=4)
(i32.store (i32.const 24) (i32.const 42))
(i64.const 17179869208)))
The program uses two imported helpers:
pvm_ptr– translates a Wasm linear-memory offset to a PVM addresshost_call_5– dispatches to a numbered host function with 5 data arguments (here,ecalli 100for logging). The_5suffix indicates the number of data registers (r7-r11) passed to the host call.
Compiled Metadata
| Field | Value |
|---|---|
| File | examples/compiled/host-call-log.pvm |
| Size | 12486 bytes |
| Format | SPI |
| Functions | 1 |
Decompiled Output
./target/release/pvm-decompiler examples/compiled/host-call-log.pvm
fn main(r0: u64, r1: u64, r3: u64, r5: u64, r6: u64, r7: u64, r12: u64) {
let var_28
// @0006
log()
u32[var_28 + 0x33000] = 42
halt()
}
What to notice:
- Host call recognition: The
ecalli 100instruction is recognized and rendered aslog(). The decompiler collapses the multi-register setup (level, target pointer/length, message pointer/length) into a single named call. - Compact output: Despite the binary being ~12 KB (due to the
pvm_ptrhelper and runtime support code being compiled in), the decompiler produces just 7 lines of pseudo-code for the main function. - Result encoding:
u32[var_28 + 0x33000] = 42stores the return value, followed byhalt(). - Binary size vs. complexity: The 12 KB binary size comes from the
pvm_ptraddress-translation helper and memory setup code compiled from the imports, not from the application logic itself.
Fibonacci (as-lan)
A Fibonacci implementation compiled through the as-lan AssemblyScript framework. Unlike the hand-written WAT fibonacci, this version comes from a full framework with logging, string formatting, and runtime support – producing a much larger binary.
Source
The source is a full as-lan project. The core fibonacci logic (abbreviated):
function fibonacci(n: i32): i32 {
if (n <= 1) return n;
let a = 0, b = 1;
for (let i = 2; i <= n; i++) {
const tmp = a + b;
a = b;
b = tmp;
}
return b;
}
See examples/sources/aslan-fib.jam.wat for the full compiled WAT (the original TypeScript source is in the as-lan project).
Compiled Metadata
| Field | Value |
|---|---|
| File | examples/compiled/aslan-fib.pvm |
| Size | 39296 bytes (~38 KB) |
| Format | SPI |
| Functions | 18 |
Decompiled Output
./target/release/pvm-decompiler examples/compiled/aslan-fib.pvm
The output is 654 lines across 18 functions. The main function:
fn main(r1: u64, r7: u64, r8: u64, r9: u64, r10: u64, r11: u64, r12: u64) {
let var_0: u64
// @0000
// @000a
var_0 = 2; jump 18414
if (0xFEFD0000 >=u r1 - 16 - 256) {
if (var_2481 == 0) {
if (var_2487 == 0) {
// ... nested initialization checks ...
u32[0x30000] = 5356
r9 = 4
r10 = 5
r0 = 190; jump -11326
}
}
}
// ...
}
What to notice:
- Framework overhead: 18 functions and 39 KB of binary for a fibonacci – the as-lan framework includes runtime support for string handling, logging, memory management, and the ecalli dispatch table.
- Nested guard checks: The
mainfunction’s deeply nestedif (var_XXXX == 0)checks are the framework’s initialization sequence, checking whether various runtime components need setup. - Scale comparison: Compare with the hand-written WAT fibonacci (335 bytes, 1 function, 7 lines of output) to see how framework overhead affects binary size and decompilation complexity.
- Indirect calls: The
call_indirectandjumppatterns show the framework’s dispatch mechanism routing to different initialization and computation paths.
Backend Comparison
The --decompile flag runs the full LLVM pipeline: PVM bytecode is lifted to LLVM IR, then decompiled to C code. Several backends are available for the final decompilation step.
Available Backends
| Backend | Flag | Status |
|---|---|---|
builtin | --backend=builtin | Works locally, no extra dependencies |
retdec | --backend=retdec | Requires RetDec installation |
rellic | --backend=rellic | Requires Rellic installation |
rellic-docker | --backend=rellic-docker | Requires Docker with Rellic image |
llvm-cbe | --backend=llvm-cbe | Requires LLVM C Backend Emitter |
For most users, builtin is the easiest option since it needs no external tools.
Example: builtin Backend
Using simple-add.pvm, a minimal hand-crafted PVM binary (21 bytes, 6 instructions):
./target/release/pvm-decompiler --decompile --backend=builtin examples/compiled/simple-add.pvm
int64_t main(void) {
int64_t r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12;
r0 = r1 = r2 = r3 = r4 = r5 = r6 = r7 = r8 = r9 = r10 = r11 = r12 = 0;
goto bb_0000;
bb_0000:
r0 = 42;
r1 = 100;
r2 = %t5;
goto bb_000f;
bb_000f:
return %t6;
}
The C output preserves the basic block structure from the LLVM IR. Variables like %t5 are LLVM temporaries that the backend has not yet resolved into concrete expressions. This is expected for the builtin backend – it prioritizes correctness over readability.
Example: br-table Through builtin
A larger example showing how the builtin backend handles branching:
./target/release/pvm-decompiler --decompile --backend=builtin examples/compiled/br-table.pvm
The output is a C function with labeled basic blocks (bb_0000, bb_000a, etc.), goto statements for control flow, and if/else for conditional branches. The switch table from the source becomes a chain of conditional jumps.
When to Use Which Output
| Goal | Recommended mode |
|---|---|
| Quick understanding of program logic | Default pseudo-code (no flags) |
| Better variable names for review | --refine |
| Integration with C toolchains | --decompile --backend=builtin |
| Deep analysis of the binary | --verbose or --debug |
| Generating LLVM IR for custom pipelines | --llvm |
The default pseudo-code mode is usually the most readable. Use --decompile when you need actual C code, for example to compile and test the decompiled output.
CLI Reference
Usage
pvm-decompiler [OPTIONS] <file.pvm>
The tool takes one PVM binary file as input and writes the result to stdout. Progress and diagnostic messages go to stderr.
Options
| Flag | Description |
|---|---|
| (no flags) | Emit structured pseudo-code (default mode) |
-v, --verbose | Show CFG, dataflow, and structural analysis alongside pseudo-code |
--debug | Show raw decoded instructions and all diagnostics |
--llvm | Emit LLVM IR instead of pseudo-code |
--decompile | Full LLVM pipeline: lift to IR, then decompile to C code |
--refine | Pass output through an LLM to improve variable names (requires OPENROUTER_API_KEY) |
--backend=X | Choose decompilation backend (used with --decompile) |
-V, --version | Show version |
-h, --help | Show help |
Backends
Used with --decompile to select the C code generation backend:
| Value | Description |
|---|---|
builtin | Built-in backend, no dependencies needed |
retdec | Uses RetDec decompiler |
rellic | Uses Rellic (Trail of Bits) |
rellic-docker | Uses Rellic via Docker container |
llvm-cbe | Uses LLVM C Backend Emitter |
Environment Variables
| Variable | Description |
|---|---|
OPENROUTER_API_KEY | API key for LLM refinement (--refine flag) |
Examples
# Basic decompilation
pvm-decompiler program.pvm
# See raw instructions
pvm-decompiler --debug program.pvm
# Verbose analysis + pseudo-code
pvm-decompiler --verbose program.pvm
# LLVM IR output
pvm-decompiler --llvm program.pvm
# C code via builtin backend
pvm-decompiler --decompile --backend=builtin program.pvm
# Pseudo-code with LLM-improved names
pvm-decompiler --refine program.pvm
# C code with LLM-improved names
pvm-decompiler --decompile --refine program.pvm
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Error (invalid input, decode failure, etc.) |