Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

pvm-decompiler is a decompiler for PVM (Polkadot Virtual Machine) bytecode. It takes compiled .pvm binary files and produces readable pseudo-code output.

What It Does

The tool performs several analysis steps on PVM binaries:

  1. Decode the binary (SPI or raw ProgramBlob format)
  2. Build a control flow graph (CFG) and detect function boundaries
  3. Analyze dataflow and variable liveness
  4. Lift register operations into higher-level expressions
  5. Recover control structures like if/else, while loops, and switch/case
  6. Emit readable pseudo-code with inferred variable types

Output Modes

The decompiler supports several output modes:

ModeFlagDescription
Pseudo-code(default)Structured pseudo-code with type annotations
Verbose--verbosePseudo-code plus CFG, dataflow, and structural analysis details
Debug--debugRaw decoded instructions and all diagnostics
LLVM IR--llvmLow-level LLVM intermediate representation
C code--decompileFull LLVM pipeline producing C output
LLM refined--refinePseudo-code with variable names improved by an LLM

Supported Input Formats

  • SPI-wrapped PVM binaries (most common)
  • Raw ProgramBlob binaries
  • Binaries with a metadata prefix (auto-stripped before decode)

How to Read This Documentation

Quick Start

Build

cargo build --release

The binary is at ./target/release/pvm-decompiler.

Basic Usage

Decompile a PVM binary to pseudo-code:

./target/release/pvm-decompiler examples/compiled/fibonacci.pvm

Output:

fn main(r1: u64, r7: u64, r8: u64) {
    let ptr_0_80
    let ptr_0_88
    let ptr_0_96

    let ptr_0_56 = u32[r7]
    ptr_0_80 = 0
    ptr_0_88 = 1
    ptr_0_96 = 0

    while (ptr_0_80 <u ptr_0_56) {
        ptr_0_80 = ptr_0_80 + 1
        ptr_0_88 = ptr_0_96 + ptr_0_88
        ptr_0_96 = ptr_0_88
    }

    u32[0x20000] = ptr_0_96
    halt()
}

Verbose Mode

Use --verbose to see the analysis details (CFG, dataflow, structural analysis) together with the pseudo-code:

./target/release/pvm-decompiler --verbose examples/compiled/fibonacci.pvm

This prints information about detected functions, def-use chains, lifted variables, detected loops, and the final pseudo-code.

Debug Mode

Use --debug to see the raw decoded PVM instructions:

./target/release/pvm-decompiler --debug examples/compiled/fibonacci.pvm

This shows things like container format, jump tables, and each individual instruction with its program counter.

LLM Refinement

If you set the OPENROUTER_API_KEY environment variable, you can ask an LLM to improve the variable names in the output:

export OPENROUTER_API_KEY="your-key-here"
./target/release/pvm-decompiler --refine examples/compiled/fibonacci.pvm

The LLM renames variables like ptr_0_80 to more meaningful names like loop_counter based on how they are used.

LLVM C Output

For a full decompilation to C code through the LLVM pipeline:

./target/release/pvm-decompiler --decompile --backend=builtin examples/compiled/fibonacci.pvm

See Backend Comparison for details about available backends.

Examples

This section walks through several real PVM programs. For each example we show:

  1. The original source code (WAT or AssemblyScript)
  2. Basic metadata about the compiled binary
  3. The decompiled pseudo-code output
  4. Where available, the LLM-refined output with better variable names

The examples go from simple to more complex:

  • Branch Table – a small WAT program with switch/case style branching
  • Fibonacci (WAT) – classic fibonacci in WebAssembly text format
  • Fibonacci (AssemblyScript) – same algorithm compiled from AssemblyScript, shows how a higher-level language compiles differently
  • Control Flow – a larger example with if/else, while, nested for loops, and break
  • JAM Fuzzy Service – a real-world Rust JAM service (~142 KB, 63 functions, no source available)
  • Ananas – a real-world AssemblyScript JAM service (~442 KB, 189 functions, source on GitHub)

The next examples show more complex patterns:

Each example can be reproduced by running the decompiler on files from the examples/compiled/ directory.

Branch Table

A small WAT program that uses br_table for indexed branching. The decompiler recovers this as a switch/case statement.

Source

File: examples/sources/br-table.wat

(module
  (memory 1)
  (func (export "main") (param $args_ptr i32) (param $args_len i32) (result i64)
    (local $index i32)
    (local $result i32)

    (local.set $index (i32.load (local.get $args_ptr)))

    (block $case3
      (block $case2
        (block $case1
          (block $case0
            (br_table $case0 $case1 $case2 $case3 (local.get $index))
          )
          (local.set $result (i32.const 100))
          (br $case3)
        )
        (local.set $result (i32.const 200))
        (br $case3)
      )
      (local.set $result (i32.const 300))
      (br $case3)
    )

    (if (i32.eq (local.get $result) (i32.const 0))
      (then
        (local.set $result (i32.const 999))
      )
    )

    (i32.store (i32.const 0) (local.get $result))
    (i64.const 17179869184)  ;; ptr=0, len=4
  )
)

The program reads an index from memory, branches to one of four cases (setting result to 100, 200, 300, or 0), then falls back to 999 if the result is still zero. Finally it writes the result to memory. The return value is a packed i64: lower 32 bits = result pointer, upper 32 bits = result length.

Compiled Metadata

FieldValue
Fileexamples/compiled/br-table.pvm
Size206 bytes
FormatSPI
Functions1

Decompiled Output

./target/release/pvm-decompiler examples/compiled/br-table.pvm
fn main(r1: u64, r7: u64) {
    let ptr_0_72

    // @0006
    let var_1 = u32[r7]

    switch (var_1) {
        case 0:
            // @007b
            ptr_0_72 = 100

        case 1:
            // @006f
            ptr_0_72 = 200

        case 2:
            // @0063
            ptr_0_72 = 300

        default:
            // @0032
            ptr_0_72 = 999

            break;
    }

    // @003a
    u32[0x3000] = ptr_0_72
    halt()
}

What to notice:

  • The br_table is recovered as a clean switch statement with four cases.
  • The variable ptr_0_72 holds the intermediate result from each case.
  • The fallback check (if result == 0 then 999) has been folded into the default case by the compiler.
  • Memory write u32[0x3000] corresponds to the i32.store at offset 0 plus the PVM memory base address.

Fibonacci (WAT)

A fibonacci implementation in WebAssembly text format. It reads n from memory, computes fib(n), and writes the result back.

Source

File: examples/sources/fibonacci.wat

(module
  (memory 1)

  (func (export "main") (param $args_ptr i32) (param $args_len i32) (result i64)
    (local $n i32)
    (local $i i32)

    ;; Read n from args
    (local.set $n (i32.load (local.get $args_ptr)))

    ;; Initialize: a=0, b=1, i=0
    (local.set $args_ptr (i32.const 0))  ;; reuse as $a
    (local.set $args_len (i32.const 1))  ;; reuse as $b
    (local.set $i (i32.const 0))

    (block $break
      (loop $continue
        (br_if $break (i32.ge_u (local.get $i) (local.get $n)))

        ;; a, b = b, a+b
        (local.get $args_len)
        (i32.add (local.get $args_ptr) (local.get $args_len))
        (local.set $args_len)
        (local.set $args_ptr)

        (local.set $i (i32.add (local.get $i) (i32.const 1)))
        (br $continue)
      )
    )

    (i32.store (i32.const 0) (local.get $args_ptr))
    (i64.const 17179869184)  ;; ptr=0, len=4
  )
)

The source reuses $args_ptr and $args_len as the fibonacci accumulators a and b after reading the input. This is a common trick in hand-written WAT to avoid extra locals. The return value is a packed i64: lower 32 bits = result pointer, upper 32 bits = result length.

Compiled Metadata

FieldValue
Fileexamples/compiled/fibonacci.pvm
Size266 bytes
FormatSPI
Functions1

Decompiled Output

./target/release/pvm-decompiler examples/compiled/fibonacci.pvm
fn main(r1: u64, r7: u64) {
    let ptr_0_72
    let ptr_0_80
    let ptr_0_88

    // @0000

    // @0006
    let ptr_0_56 = u32[r7]
    ptr_0_72 = 0
    ptr_0_80 = 1
    ptr_0_88 = 0

    while (ptr_0_72 <u ptr_0_56) {
        // @0075
        ptr_0_72 = ptr_0_72 + 1
        ptr_0_80 = ptr_0_88 + ptr_0_80 << 32 >>u 32
        ptr_0_88 = ptr_0_80
    }

    // @0033
    u32[0x3000] = ptr_0_88
    halt()
}

What to notice:

  • The loop/block pair from WAT is recovered as a while loop.
  • ptr_0_56 holds the input n, read from memory at r7.
  • ptr_0_72 is the loop counter i.
  • ptr_0_80 and ptr_0_88 correspond to the fibonacci accumulators b and a.
  • The swap logic a, b = b, a+b is visible in the loop body.

Fibonacci (AssemblyScript)

The same fibonacci algorithm, but written in AssemblyScript. Comparing this with the WAT version shows how a higher-level source language produces different binary structure.

Source

File: examples/sources/as-fibonacci.ts

export let result_ptr: i32 = 0;
export let result_len: i32 = 0;

export function main(args_ptr: i32, args_len: i32): void {
  const RESULT_HEAP = heap.alloc(256);
  let n = load<i32>(args_ptr);
  let a: i32 = 0;
  let b: i32 = 1;

  while (n > 0) {
    b = a + b;
    a = b - a;
    n = n - 1;
  }

  store<i32>(RESULT_HEAP, a);

  result_ptr = RESULT_HEAP as i32;
  result_len = 4;
}

Compared to the WAT version, the AssemblyScript source uses heap.alloc() for the output buffer and exports result_ptr/result_len as globals. The compiler generates a two-function binary (entry wrapper + actual logic).

Compiled Metadata

FieldValue
Fileexamples/compiled/as-fibonacci.pvm
Size1338 bytes
FormatSPI
Functions2
Instructions334
Jump table entries3
Code size1118 bytes

The binary is about 4x larger than the WAT version. The AssemblyScript compiler adds runtime support code, a heap allocator call, and a separate entry function.

Decompiled Output

./target/release/pvm-decompiler examples/compiled/as-fibonacci.pvm
fn main(r1: u64, r7: u64, r8: u64, r9: u64, r10: u64, r11: u64, r12: u64) {
    func_1(r1 - 16)
}

fn func_1(r1: u64) {
    let ptr_0_40
    let ptr_0_520
    let ptr_0_528
    let ptr_0_536
    let ptr_0_88

    ptr_0_40 = u64[r1] - 0x50000
    ptr_0_88 = heap_alloc(272)
    ptr_0_520 = 0
    ptr_0_528 = 1
    ptr_0_536 = *ptr_0_40

    while (ptr_0_536 >s 0) {
        let var_136 = ptr_0_528 + ptr_0_520
        ptr_0_520 = var_136 - ptr_0_520
        ptr_0_528 = var_136
        ptr_0_536 = ptr_0_536 - 1
    }

    *ptr_0_88 = ptr_0_520
    RESULT_PTR = ptr_0_88
    RESULT_LEN = 4
    halt()
}

What to notice:

  • The decompiler detects two functions: a thin main wrapper and the actual func_1.
  • heap_alloc(272) corresponds to the source heap.alloc(256) – the AssemblyScript runtime adds a small header to each allocation.
  • The fibonacci loop is clean: ptr_0_520 is a, ptr_0_528 is b, and ptr_0_536 is the countdown n.
  • RESULT_PTR and RESULT_LEN are recognized as global exports.
  • The >s operator means “signed greater than”, matching the source n > 0.

Refined Output (LLM)

./target/release/pvm-decompiler --refine examples/compiled/as-fibonacci.pvm
fn main(r1: u64, r7: u64, r8: u64, r9: u64, r10: u64, r11: u64, r12: u64) {
    func_1(r1 - 16)
}

fn func_1(r1: u64) {
    let input_data_ptr
    let fib_next
    let fib_current
    let loop_counter
    let output_buffer

    input_data_ptr = u64[r1] - 0x50000
    output_buffer = heap_alloc(272)
    fib_current = 0
    fib_next = 1
    loop_counter = *input_data_ptr

    while (loop_counter >s 0) {
        let next_val = fib_next + fib_current
        fib_current = next_val - fib_current
        fib_next = next_val
        loop_counter = loop_counter - 1
    }

    *output_buffer = fib_current
    RESULT_PTR = output_buffer
    RESULT_LEN = 4
    halt()
}

The LLM correctly identifies the fibonacci pattern and gives meaningful names: fib_current, fib_next, loop_counter, and output_buffer.

Comparison with WAT Version

AspectWATAssemblyScript
Binary size~335 bytes1338 bytes
Functions12
Instructions~70334
Memory modelDirect store to address 0heap_alloc + globals
Loop stylei < n (unsigned)n > 0 (signed countdown)

The WAT version is smaller because it is hand-written and avoids runtime overhead. The AssemblyScript version includes compiler-generated boilerplate but the core algorithm is still clearly visible in the decompiled output.

Control Flow

A larger AssemblyScript example that exercises multiple control flow patterns: if/else, while, nested for loops, and break.

Source

File: examples/sources/as-tests-control-flow.ts

let RESULT_HEAP: usize = 0;

export let result_ptr: i32 = 0;
export let result_len: i32 = 0;

function writeResult(val: i32): void {
  store<i32>(RESULT_HEAP, val);
  result_ptr = RESULT_HEAP as i32;
  result_len = 4;
}

export function main(args_ptr: i32, args_len: i32): void {
  RESULT_HEAP = heap.alloc(256);
  const input = load<i32>(args_ptr);
  let result = 0;

  // If/Else
  if (input > 10) {
    result = 1;
  } else {
    result = 2;
  }

  // While loop
  let i = 0;
  while (i < input) {
    result += 1;
    i++;
  }

  // Nested loop with break
  for (let j = 0; j < 5; j++) {
    for (let k = 0; k < 5; k++) {
      if (k > 2) break;
      result++;
    }
  }

  writeResult(result);
}

This program does three things in sequence:

  1. Sets result to 1 or 2 depending on whether input > 10
  2. Adds input to result via a while loop
  3. Adds to result in a 5x5 nested loop, but the inner loop breaks when k > 2 (so effectively 5x3 = 15 iterations)

Compiled Metadata

FieldValue
Fileexamples/compiled/as-tests-control-flow.pvm
FormatSPI
Functions2

Decompiled Output

./target/release/pvm-decompiler examples/compiled/as-tests-control-flow.pvm
fn main(r1: u64, r7: u64, r8: u64, r9: u64, r10: u64, r11: u64, r12: u64) {
    func_1(r1 - 16)
}

fn func_1(r1: u64) {
    let ptr_0_40
    let ptr_0_512
    let ptr_0_568
    let ptr_0_576
    let ptr_0_680
    let ptr_0_688
    let ptr_0_760
    let ptr_0_768
    let ptr_0_88

    ptr_0_40 = u64[r1] - 0x50000
    ptr_0_88 = heap_alloc(272)

    RESULT_PTR = ptr_0_88
    let ptr_0_464 = *ptr_0_40
    ptr_0_512 = 2

    if (*ptr_0_40 <=s 10) {
        ptr_0_568 = 0
        ptr_0_576 = ptr_0_512
        goto block_0376;
    } else {
    }

    ptr_0_512 = 1

    ptr_0_568 = 0
    ptr_0_576 = ptr_0_512

    block_0376:
    while (ptr_0_568 <s ptr_0_464) {
        ptr_0_568 = ptr_0_568 + 1
        ptr_0_576 = ptr_0_576 + 1
    }

    ptr_0_680 = 0
    ptr_0_688 = ptr_0_576

    while (ptr_0_680 <s 5) {
        ptr_0_760 = 0
        ptr_0_768 = ptr_0_688

        while (ptr_0_760 <s 5 & ptr_0_760 <=s 2) {
            ptr_0_760 = ptr_0_760 + 1
            ptr_0_768 = ptr_0_768 + 1
        }

        ptr_0_680 = ptr_0_680 + 1
        ptr_0_688 = ptr_0_768
    }

    u32[RESULT_PTR + 0x50000] = ptr_0_688
    RESULT_LEN = 4
    HEAP_PTR = 4
    halt()
}

What to notice:

  • If/else recovery: The if (*ptr_0_40 <=s 10) branch corresponds to the source if (input > 10) (the condition is inverted because the compiler swapped the true/false branches). ptr_0_512 starts as 2 (the else case) and gets overwritten to 1 if the condition falls through.

  • While loop: The while (ptr_0_568 <s ptr_0_464) loop is a direct match to while (i < input). The variable ptr_0_576 accumulates the result.

  • Nested loops with break: The outer while (ptr_0_680 <s 5) is the for j loop. The inner while (ptr_0_760 <s 5 & ptr_0_760 <=s 2) combines the loop condition k < 5 with the break condition k > 2 into a single compound condition. This is how the decompiler represents early exits from loops.

  • Inlined function: The writeResult helper is inlined by the compiler, so it appears as direct assignments to RESULT_PTR, RESULT_LEN, and a memory store at the end.

Reading Tips

When analyzing decompiled PVM output, keep these patterns in mind:

Pattern in outputMeaning
u32[addr] or u64[addr]Memory load/store at the given address
*ptrPointer dereference (load from computed address)
>s, <s, <=sSigned comparison operators
<u, >=uUnsigned comparison operators
heap_alloc(n)Runtime heap allocation of n bytes
RESULT_PTR, RESULT_LENRecognized global exports
halt()Program termination (ecalli)
goto block_XXXXJump to a labeled block (unstructured control flow)

JAM Fuzzy Service

This is a real-world JAM service binary compiled in Rust. No source code is available for this one – it is included as a stress test for the decompiler on a production-size program.

Compiled Metadata

FieldValue
Fileexamples/compiled/jam-fuzzy-service.pvm
Size145,725 bytes (~142 KB)
FormatSPI
Functions63
Output lines~10,900
Jump table entries962

This is significantly larger than the toy examples. The binary contains 63 detected functions and almost a thousand jump table entries, which means the original code uses heavy branching – typical for a service that handles many message types.

Decompiled Output (excerpt)

./target/release/pvm-decompiler examples/compiled/jam-fuzzy-service.pvm

The full output is about 10,900 lines. Here is the main function signature and a representative fragment showing nested conditionals with host calls:

fn main(r0: u64, r1: u64, r2: u64, r3: u64, r4: u64, r5: u64, r6: u64, r7: u64, r8: u64) {
    let cond_128: bool
    let ptr_0: ptr
    let ptr_1073: ptr
    let ptr_1073_0
    ...

A deeper fragment showing service logic:

    if (0x4F87 >>u ptr_675_0 & 1 != 0) {
        if (fetch() >=u var_16944) {
            if (var_16944 != -1) {
                u64[ptr_1071 + 72] = 0
                u64[ptr_1071 + 48] = 1
                u64[ptr_1071 + 56] = 8
                u64[ptr_1071 + 64] = 0
                r8 = 0x17F10
                r7 = ptr_1071 + 40
                goto block_12850;
            } else {
                ptr_1071_32->field_0 = -0x8000000000000000
            }
        } else {
            if (var_16944 <s 0) {
                r7 = 0x17D28
                goto block_12480;
            } else {
                if (var_16944 == 0) {
                    let var_16960 = 0
                    u64[ptr_1071 + 0] = 1
                    goto block_155d7;
                } else {
                    ...
                }
            }
        }
    }

What to notice:

  • The decompiler recovers deeply nested if/else trees from flat branch chains.
  • fetch() is a recognized PVM host call (ecalli).
  • Field access patterns like ptr_1071_32->field_0 show the decompiler trying to infer struct-like memory layouts.
  • Many goto targets remain because the control flow is too complex for full structuring. This is expected for large real-world binaries.
  • The 63 detected functions give a rough sense of the original module structure.

Detected Functions

The decompiler identifies 63 functions. The first few:

fn main(r0: u64, r1: u64, r2: u64, r3: u64, r4: u64, r5: u64, r6: u64, r7: u64, r8: u64)
fn func_0(r1: u64, r5: u64)
fn func_1(r0: u64, r1: u64, r5: u64, r6: u64, r7: u64, r8: u64)
fn func_2(r1: u64, r5: u64)
fn func_3(r1: u64, r5: u64, r6: u64)
fn func_4(r7: u64, r8: u64)
fn func_5(r7: u64, r8: u64)
fn func_6(r0: u64, r1: u64, r5: u64, r6: u64, r7: u64, r8: u64)
...

The varying function signatures (different register sets) reflect the Rust compiler’s calling conventions at the PVM level.

Ananas

Ananas is a JAM service written in AssemblyScript. The source code is available at github.com/tomusdrw/anan-as. It is the largest example in this repository and is useful for testing the decompiler on a complex, real-world AssemblyScript binary.

Compiled Metadata

FieldValue
Fileexamples/compiled/ananas.pvm
Size452,760 bytes (~442 KB)
FormatSPI
Functions189
Output lines~11,300
Jump table entries1,066

This is the largest binary in the examples directory – about 3x the size of the JAM fuzzy service. The AssemblyScript compiler generates 189 functions, which includes runtime support (garbage collector, memory allocator, string handling) in addition to the actual service logic.

Decompiled Output (excerpt)

./target/release/pvm-decompiler examples/compiled/ananas.pvm

The entry point and initialization:

fn main(r1: u64, r7: u64, r8: u64, r9: u64, r10: u64, r11: u64, r12: u64) {
    let ptr_0: ptr
    let ptr_0_104
    let ptr_0_208
    let ptr_0_240
    let ptr_0_248
    let ptr_0_320
    let ptr_0_40
    ...

    if (0xFEFD0000 >=u r1 - 16 - 256) {
        ptr_1088 = ptr_992 - 256
    }

    if (0xFEFD0000 >=u ptr_1088 - 3760) {
        RESULT_PTR = 0x4DFC
        var_883 = 1856
        var_884 = 0
        var_885 = 0
        goto block_399c;
    }

    return

A fragment showing the memory allocator logic (typical AssemblyScript runtime):

    var_167 = u32[52]
    var_168 = ptr_0_528 + var_167

    if (var_168 <u 1024) {
        r4 = -1
    } else {
        u32[52] = var_168
        let ptr_6 = sbrk(16 << var_167 - var_168)
        goto block_3897;
    }

A fragment showing bitwise operations for data processing:

    let var_97 = 32 >>u (32 << (0xFFFFFFF0 & 32 >>u (32 << 15 + ...)))
    ptr_0_320 = var_97

    if (var_69 <u var_97 <u 0) {
        goto block_38ea;
    }

What to notice:

  • The stack guard check 0xFEFD0000 >=u r1 - 16 - 256 at the top is inserted by the AssemblyScript compiler to detect stack overflow.
  • sbrk() calls are recognized as the PVM memory growth host function. The allocator pattern (check available space, grow if needed) is clearly visible.
  • RESULT_PTR is recognized as the standard JAM service output global.
  • The 189 functions include many that are part of the AssemblyScript runtime, not user code. Functions like memory copy, string operations, and GC routines make up a significant portion of the output.

Detected Functions

A selection of the 189 detected functions:

fn main(r1: u64, r7: u64, r8: u64, r9: u64, r10: u64, r11: u64, r12: u64)
fn func_1(r0: u64, r1: u64, r9: u64, r10: u64, r11: u64, r12: u64)
fn func_2(r0: u64, r1: u64, r9: u64, r10: u64, r11: u64, r12: u64)
fn func_3(r0: u64, r1: u64, r9: u64, r10: u64, r11: u64, r12: u64)
fn func_4(r1: u64, r7: u64)
fn func_5(r0: u64, r1: u64, r9: u64, r10: u64, r11: u64, r12: u64)
...
fn func_21(r1: u64)
fn func_23(r1: u64, r7: u64)
...

Most functions share the signature (r0, r1, r9, r10, r11, r12) which reflects the AssemblyScript compiler’s standard calling convention for PVM targets. Functions with fewer parameters (like func_4(r1, r7) and func_21(r1)) are likely utility or helper functions.

Comparison with JAM Fuzzy Service

AspectJAM Fuzzy ServiceAnanas
Source languageRustAssemblyScript
Binary size142 KB442 KB
Functions63189
Output lines~10,900~11,300
Jump table entries9621,066
Runtime overheadMinimalLarge (AS runtime included)

Despite being 3x larger in binary size, the ananas output is only slightly longer than the JAM fuzzy service. This is because much of the binary size comes from data sections and runtime code that decompiles into repetitive patterns. The Rust binary is more compact but its logic is denser.

Functions (AssemblyScript)

An AssemblyScript program with multiple helper functions: a three-argument add, a recursive factorial, and a square function called in a loop.

Source

File: examples/sources/as-tests-functions.ts

// Memory addresses
let RESULT_HEAP: usize = 0;

function writeResult(val: i32): i64 {
  store<i32>(RESULT_HEAP, val);
  return (RESULT_HEAP as i64) | ((4 as i64) << 32);
}

// Function with multiple args
function add3(a: i32, b: i32, c: i32): i32 {
  return a + b + c;
}

// Recursive function
function factorial(n: i32): i32 {
  if (n <= 1) return 1;
  return n * factorial(n - 1);
}

// Function calls in loop
function square(n: i32): i32 {
  return n * n;
}

export function main(args_ptr: i32, args_len: i32): i64 {
  RESULT_HEAP = heap.alloc(256);
  const n = load<i32>(args_ptr); // Input 5

  let res = add3(n, 2, 3); // 5 + 2 + 3 = 10

  res += factorial(n); // 10 + 120 = 130

  let sumSquares = 0;
  for (let i = 0; i < 3; i++) {
    sumSquares += square(i); // 0 + 1 + 4 = 5
  }

  res += sumSquares; // 130 + 5 = 135

  return writeResult(res);
}

The program computes add3(5, 2, 3) + factorial(5) + (0^2 + 1^2 + 2^2) = 10 + 120 + 5 = 135.

Compiled Metadata

FieldValue
Fileexamples/compiled/as-tests-functions.pvm
Size986 bytes
FormatSPI
Functions3

Decompiled Output

./target/release/pvm-decompiler examples/compiled/as-tests-functions.pvm
fn func_2(r1: u64, r7: u64) {
    // @0139
    u64[r1 + 224] = r7
    u64[r1 + 264] = 0
    u64[r1 + 272] = 0

    while (u64[r1 + 272] <s 3) {
        // @01b2
        let var_8 = u64[r1 + 264] + u64[r1 + 272] * u64[r1 + 272]
        u64[r1 + 296] = var_8
        u64[r1 + 264] = var_8
        u64[r1 + 272] = u64[r1 + 272] + 1
    }

    // @01e2
    let var_16 = u64[r1 + 224]
    u64[r1 + 328] = var_16
    let var_20 = u64[r1 + 216] + 5 + var_16
    u64[r1 + 336] = var_20
    let var_21 = RESULT_PTR
    u64[r1 + 344] = var_21
    let var_28 = var_20 + u64[r1 + 264] << 32 >>u 32
    u64[r1 + 360] = var_28
    u32[var_21 + 0x33000] = var_28
    halt()
}

(Showing func_2 which contains the interesting computation; main and func_1 handle entry and heap allocation boilerplate.)

What to notice:

  • Inlined helpers: The add3, factorial, and square functions are inlined by the AssemblyScript compiler. The decompiler sees only the resulting combined computation.
  • Square-in-loop: The while (u64[r1 + 272] <s 3) loop corresponds to the for (let i = 0; i < 3; i++) loop calling square(i). The expression u64[r1 + 272] * u64[r1 + 272] is the inlined i * i.
  • Stack-frame layout: Variables are stored at stack offsets (r1 + 224, r1 + 264, etc.) rather than in registers, reflecting the AssemblyScript compiler’s frame-based calling convention.
  • Result encoding: The final u32[var_21 + 0x33000] = var_28 writes the result to the heap, followed by halt().

Linked List (AssemblyScript)

An AssemblyScript program that creates a three-node linked list and sums its values recursively.

Source

File: examples/sources/as-tests-linked-list.ts

// Memory addresses
let RESULT_HEAP: usize = 0;
let NODE_HEAP: usize = 0;

function writeResult(val: i32): i64 {
  store<i32>(RESULT_HEAP, val);
  return (RESULT_HEAP as i64) | ((4 as i64) << 32);
}

// Node structure: [value: i32, next: i32] (8 bytes)

function createNode(ptr: i32, val: i32, next: i32): void {
  store<i32>(ptr, val);
  store<i32>(ptr + 4, next);
}

function sumList(head: i32): i32 {
  if (head == 0) return 0;

  const val = load<i32>(head);
  const next = load<i32>(head + 4);

  // Recursive sum
  return val + sumList(next);
}

export function main(args_ptr: i32, args_len: i32): i64 {
  RESULT_HEAP = heap.alloc(256);
  NODE_HEAP = heap.alloc(32); // 3 nodes * 8 bytes each = 24 bytes
  // Create list: 10 -> 20 -> 30 -> null

  createNode(NODE_HEAP, 10, NODE_HEAP + 8);
  createNode(NODE_HEAP + 8, 20, NODE_HEAP + 16);
  createNode(NODE_HEAP + 16, 30, 0);

  const sum = sumList(NODE_HEAP); // 60

  return writeResult(sum);
}

The program builds a linked list 10 -> 20 -> 30 -> null, then recursively sums the values to produce 60.

Compiled Metadata

FieldValue
Fileexamples/compiled/as-tests-linked-list.pvm
Size1945 bytes
FormatSPI
Functions5

Decompiled Output

./target/release/pvm-decompiler examples/compiled/as-tests-linked-list.pvm

The output is large (501 lines) due to heap allocation boilerplate. Here are the most interesting fragments:

fn func_2(r7: u64) {
    // @035f
    u32[RESULT_PTR + 0x33000] = r7
    halt()
}

func_2 is the writeResult helper – it writes the result to the heap and halts.

fn func_4(r0: u64, r1: u64, r9: u64, r10: u64, r11: u64) {
    if (0xFEFD0000 >=u r1 - 72) {
        // @056e
        u32[r9 + 0x33000] = r10
        u32[r9 + 0x33004] = r11
        call_indirect(r0)
    }
}

func_4 is the createNode helper – it stores value and next at adjacent 4-byte offsets, matching the [value: i32, next: i32] node layout.

What to notice:

  • Pointer arithmetic: The node structure is visible as two consecutive u32 stores at offsets +0x33000 and +0x33004 (4 bytes apart).
  • Recursive traversal: The sumList function compiles into func_3, which uses call_indirect(r0) to call back into itself – the recursive sumList(next) call.
  • Two heap allocations: func_1 (the main logic) performs two heap_alloc calls, matching heap.alloc(256) and heap.alloc(32) from the source.
  • Heap boilerplate: Much of the output is the sbrk-based heap allocator pattern. The design principle of this decompiler favors showing high-level intent, but the allocator code is not yet collapsed for this example.

Game of Life (AssemblyScript)

A Conway’s Game of Life implementation on a 16x16 toroidal grid. It seeds glider, blinker, and toad patterns, then steps the simulation.

Source

File: examples/sources/as-life.ts

const WIDTH: i32 = 16;
const HEIGHT: i32 = 16;
const CELL_COUNT: i32 = WIDTH * HEIGHT;

let BUF_A: u32 = 0;
let BUF_B: u32 = 0;
let OUTPUT_BASE: u32 = 0;

@inline
function idx(x: i32, y: i32): u32 {
  return (y * WIDTH + x) as u32;
}

@inline
function get(base: u32, x: i32, y: i32): u32 {
  return load<u8>(base + idx(x, y)) as u32;
}

@inline
function set(base: u32, x: i32, y: i32, v: u32): void {
  store<u8>(base + idx(x, y), v as u8);
}

function step_once(src: u32, dst: u32): void {
  for (let y = 0; y < HEIGHT; ++y) {
    for (let x = 0; x < WIDTH; ++x) {
      // count 8 neighbors with toroidal wrapping
      // apply B3/S23 rule
    }
  }
}

export function main(args_ptr: i32, args_len: i32): i64 {
  const base = heap.alloc(CELL_COUNT * 2 + 8 + CELL_COUNT) as u32;
  // ...seed, step, encode result...
}

(Source abbreviated for readability; see examples/sources/as-life.ts for the full listing.)

Compiled Metadata

FieldValue
Fileexamples/compiled/as-life.pvm
Size2298 bytes
FormatSPI
Functions2

Decompiled Output

./target/release/pvm-decompiler examples/compiled/as-life.pvm

The output is 215 lines. Key fragments from func_1:

    while (ptr_0_296 <s 256) {
        // @01d7
        u8[var_96 + 0x33000] = 0
        ptr_0_296 = ptr_0_296 + 1
    }

This is the clear() function inlined – it zeroes out 256 bytes (the 16x16 grid).

    u8[r2 + 0x33000] = 1
    u8[r2 + 0x33000] = 1
    // ... (14 stores total)

These are the seed_world() calls inlined – each set(base, x, y, 1) becomes a direct byte store. The glider, blinker, and toad patterns total 14 alive cells.

    while (ptr_0_688 <s 256) {
        u8[HEAP_PTR + 8 + ptr_0_688 << 32 >>u 32 + 0x33000] =
            u8[ptr_0_688 + ptr_0_608 << 32 >>u 32 + 0x33000]
        ptr_0_688 = ptr_0_688 + 1
    }

This is encode_result() – copying the cell buffer into the output area (skipping the 8-byte width/height header).

What to notice:

  • Aggressive inlining: All helper functions (idx, get, set, clear, seed_world, encode_result) are @inline or inlined by the compiler, producing a single large func_1.
  • Constant-folded seeds: The seed pattern stores are fully unrolled – no loops, just 14 direct u8 stores.
  • Loop structure: The while (... <s 256) loops correspond to iterating over CELL_COUNT (16 * 16 = 256) cells.
  • Double buffering: The simulation swaps between BUF_A and BUF_B using RESULT_LEN and RESULT_PTR as buffer base pointers.

Host Call Log (WAT)

A minimal WAT program that demonstrates PVM host calls. It invokes ecalli 100 (the log host call) to print “Hello from PVM!” and returns 42.

Source

File: examples/sources/host-call-log.wat

(module
  (import "env" "pvm_ptr" (func $pvm_ptr (param i64) (result i64)))
  (import "env" "host_call_5" (func $host_call_5 (param i64 i64 i64 i64 i64 i64) (result i64)))
  (memory (export "memory") 1)
  ;; "test-log" at offset 0 (8 bytes)
  (data (i32.const 0) "test-log")
  ;; "Hello from PVM!" at offset 8 (15 bytes)
  (data (i32.const 8) "Hello from PVM!")
  (func (export "main") (param $args_ptr i32) (param $args_len i32) (result i64)
    ;; ecalli 100 = log host call
    ;; r7 = level (3 = INFO)
    ;; r8 = target_ptr (PVM address of "test-log")
    ;; r9 = target_len (8)
    ;; r10 = msg_ptr (PVM address of "Hello from PVM!")
    ;; r11 = msg_len (15)
    (drop (call $host_call_5
      (i64.const 100)
      (i64.const 3)
      (call $pvm_ptr (i64.const 0))
      (i64.const 8)
      (call $pvm_ptr (i64.const 8))
      (i64.const 15)))
    ;; Return result: store 42 at offset 24, return (ptr=24, len=4)
    (i32.store (i32.const 24) (i32.const 42))
    (i64.const 17179869208)))

The program uses two imported helpers:

  • pvm_ptr – translates a Wasm linear-memory offset to a PVM address
  • host_call_5 – dispatches to a numbered host function with 5 data arguments (here, ecalli 100 for logging). The _5 suffix indicates the number of data registers (r7-r11) passed to the host call.

Compiled Metadata

FieldValue
Fileexamples/compiled/host-call-log.pvm
Size12486 bytes
FormatSPI
Functions1

Decompiled Output

./target/release/pvm-decompiler examples/compiled/host-call-log.pvm
fn main(r0: u64, r1: u64, r3: u64, r5: u64, r6: u64, r7: u64, r12: u64) {
    let var_28
    // @0006
    log()
    u32[var_28 + 0x33000] = 42
    halt()
}

What to notice:

  • Host call recognition: The ecalli 100 instruction is recognized and rendered as log(). The decompiler collapses the multi-register setup (level, target pointer/length, message pointer/length) into a single named call.
  • Compact output: Despite the binary being ~12 KB (due to the pvm_ptr helper and runtime support code being compiled in), the decompiler produces just 7 lines of pseudo-code for the main function.
  • Result encoding: u32[var_28 + 0x33000] = 42 stores the return value, followed by halt().
  • Binary size vs. complexity: The 12 KB binary size comes from the pvm_ptr address-translation helper and memory setup code compiled from the imports, not from the application logic itself.

Fibonacci (as-lan)

A Fibonacci implementation compiled through the as-lan AssemblyScript framework. Unlike the hand-written WAT fibonacci, this version comes from a full framework with logging, string formatting, and runtime support – producing a much larger binary.

Source

The source is a full as-lan project. The core fibonacci logic (abbreviated):

function fibonacci(n: i32): i32 {
  if (n <= 1) return n;
  let a = 0, b = 1;
  for (let i = 2; i <= n; i++) {
    const tmp = a + b;
    a = b;
    b = tmp;
  }
  return b;
}

See examples/sources/aslan-fib.jam.wat for the full compiled WAT (the original TypeScript source is in the as-lan project).

Compiled Metadata

FieldValue
Fileexamples/compiled/aslan-fib.pvm
Size39296 bytes (~38 KB)
FormatSPI
Functions18

Decompiled Output

./target/release/pvm-decompiler examples/compiled/aslan-fib.pvm

The output is 654 lines across 18 functions. The main function:

fn main(r1: u64, r7: u64, r8: u64, r9: u64, r10: u64, r11: u64, r12: u64) {
    let var_0: u64

    // @0000
    // @000a
    var_0 = 2; jump 18414

    if (0xFEFD0000 >=u r1 - 16 - 256) {
        if (var_2481 == 0) {
            if (var_2487 == 0) {
                // ... nested initialization checks ...
                        u32[0x30000] = 5356
                        r9 = 4
                        r10 = 5
                        r0 = 190; jump -11326
            }
        }
    }
    // ...
}

What to notice:

  • Framework overhead: 18 functions and 39 KB of binary for a fibonacci – the as-lan framework includes runtime support for string handling, logging, memory management, and the ecalli dispatch table.
  • Nested guard checks: The main function’s deeply nested if (var_XXXX == 0) checks are the framework’s initialization sequence, checking whether various runtime components need setup.
  • Scale comparison: Compare with the hand-written WAT fibonacci (335 bytes, 1 function, 7 lines of output) to see how framework overhead affects binary size and decompilation complexity.
  • Indirect calls: The call_indirect and jump patterns show the framework’s dispatch mechanism routing to different initialization and computation paths.

Backend Comparison

The --decompile flag runs the full LLVM pipeline: PVM bytecode is lifted to LLVM IR, then decompiled to C code. Several backends are available for the final decompilation step.

Available Backends

BackendFlagStatus
builtin--backend=builtinWorks locally, no extra dependencies
retdec--backend=retdecRequires RetDec installation
rellic--backend=rellicRequires Rellic installation
rellic-docker--backend=rellic-dockerRequires Docker with Rellic image
llvm-cbe--backend=llvm-cbeRequires LLVM C Backend Emitter

For most users, builtin is the easiest option since it needs no external tools.

Example: builtin Backend

Using simple-add.pvm, a minimal hand-crafted PVM binary (21 bytes, 6 instructions):

./target/release/pvm-decompiler --decompile --backend=builtin examples/compiled/simple-add.pvm
int64_t main(void) {
    int64_t r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12;
    r0 = r1 = r2 = r3 = r4 = r5 = r6 = r7 = r8 = r9 = r10 = r11 = r12 = 0;
    goto bb_0000;
bb_0000:
    r0 = 42;
    r1 = 100;
    r2 = %t5;
    goto bb_000f;
bb_000f:
    return %t6;
}

The C output preserves the basic block structure from the LLVM IR. Variables like %t5 are LLVM temporaries that the backend has not yet resolved into concrete expressions. This is expected for the builtin backend – it prioritizes correctness over readability.

Example: br-table Through builtin

A larger example showing how the builtin backend handles branching:

./target/release/pvm-decompiler --decompile --backend=builtin examples/compiled/br-table.pvm

The output is a C function with labeled basic blocks (bb_0000, bb_000a, etc.), goto statements for control flow, and if/else for conditional branches. The switch table from the source becomes a chain of conditional jumps.

When to Use Which Output

GoalRecommended mode
Quick understanding of program logicDefault pseudo-code (no flags)
Better variable names for review--refine
Integration with C toolchains--decompile --backend=builtin
Deep analysis of the binary--verbose or --debug
Generating LLVM IR for custom pipelines--llvm

The default pseudo-code mode is usually the most readable. Use --decompile when you need actual C code, for example to compile and test the decompiled output.

CLI Reference

Usage

pvm-decompiler [OPTIONS] <file.pvm>

The tool takes one PVM binary file as input and writes the result to stdout. Progress and diagnostic messages go to stderr.

Options

FlagDescription
(no flags)Emit structured pseudo-code (default mode)
-v, --verboseShow CFG, dataflow, and structural analysis alongside pseudo-code
--debugShow raw decoded instructions and all diagnostics
--llvmEmit LLVM IR instead of pseudo-code
--decompileFull LLVM pipeline: lift to IR, then decompile to C code
--refinePass output through an LLM to improve variable names (requires OPENROUTER_API_KEY)
--backend=XChoose decompilation backend (used with --decompile)
-V, --versionShow version
-h, --helpShow help

Backends

Used with --decompile to select the C code generation backend:

ValueDescription
builtinBuilt-in backend, no dependencies needed
retdecUses RetDec decompiler
rellicUses Rellic (Trail of Bits)
rellic-dockerUses Rellic via Docker container
llvm-cbeUses LLVM C Backend Emitter

Environment Variables

VariableDescription
OPENROUTER_API_KEYAPI key for LLM refinement (--refine flag)

Examples

# Basic decompilation
pvm-decompiler program.pvm

# See raw instructions
pvm-decompiler --debug program.pvm

# Verbose analysis + pseudo-code
pvm-decompiler --verbose program.pvm

# LLVM IR output
pvm-decompiler --llvm program.pvm

# C code via builtin backend
pvm-decompiler --decompile --backend=builtin program.pvm

# Pseudo-code with LLM-improved names
pvm-decompiler --refine program.pvm

# C code with LLM-improved names
pvm-decompiler --decompile --refine program.pvm

Exit Codes

CodeMeaning
0Success
1Error (invalid input, decode failure, etc.)