Internals | Lambda Smalltalk

Architecture Overview

Lambda Smalltalk combines multiple optimization techniques to achieve performance competitive with LuaJIT and Node.js V8.

Cranelift JIT

Hot methods are automatically compiled to native code using Cranelift, the same backend used by Wasmtime and Rust's alternative codegen.

Hot method detection: Methods called 10+ times trigger JIT compilation
Cross-platform: x86-64, ARM64, RISC-V
Speculative execution: Overflow checks with fast-path optimization

Fast Path (no overflow):
  sadd_overflow a, b -> (result, overflow_flag)
  brif overflow_flag -> deopt
  tag result as SmallInteger
  continue

Deoptimization (overflow):
  save all registers to frame
  return DEOPT sentinel with resume IP

Branch prediction makes overflow checks nearly free while preserving Smalltalk's unlimited precision integer semantics.

True Deoptimization

When JIT-compiled code encounters overflow or non-SmallInteger arguments, it performs true deoptimization (LuaJIT-style):

Save register state to the current frame
Record resume IP — the exact bytecode instruction where overflow occurred
Return DEOPT sentinel to the VM
Interpreter resumes at the saved IP with full LargeInteger support

This is different from naive "restart the method" approaches. Lambda Smalltalk resumes execution at the exact point where JIT failed, preserving all intermediate results.

Example: x := (2^31 - 1) + 1

JIT executes:  LOADI r0, 2147483647
               ADDI r0, r0, 1     ← overflow detected here
               [DEOPT: save r0, resume_ip=1]

Interpreter resumes at ip=1:
               ADDI r0, r0, 1     ← re-executes with LargeInt
               RETURN r0          → LargeInteger(2147483648)

Deoptimization is rare in practice (most code uses SmallInts), but when it happens, it's precise and efficient.

NaN Boxing

All values fit in 8 bytes using NaN boxing:

Type	Representation
Integer	Lower 32 bits (SmallInt) or heap object (LargeInt)
Float	Standard IEEE 754 double
Object	NaN with object ID in payload
Symbol	NaN with symbol ID in payload
Bool/Nil	Special NaN patterns

No type tags, no pointer chasing for primitives. Arithmetic operates directly on machine integers.

Inline Caching + PIC

Method dispatch uses a three-level caching system:

Monomorphic Fast Path: Single class ID check + direct jump (added Jan 2026)
Inline Cache (256 entries): Direct-mapped cache for call sites
Polymorphic Inline Cache (4 entries per site): Handles polymorphic dispatch

SEND execution:
1. Check if receiver class == cached class (monomorphic fast path)
   → Direct jump to method (1 comparison, 1 branch)
2. Check inline cache (hit = direct jump)
3. Check PIC (hit = 4-way lookup)
4. Full method lookup (miss)
5. Update cache

The monomorphic fast path optimizes the most common case: same class at the same call site. Benchmarks show 15-20% improvement for method-heavy code (e.g., collection iteration).

Register-Based VM

32-bit fixed-length instructions with 256 registers per frame:

Format A: op(8) + rd(8) + ra(8) + rb(8)   [3 registers]
Format B: op(8) + rd(8) + ra(8) + imm8    [2 registers + immediate]
Format C: op(8) + rd(8) + imm16           [1 register + immediate]

No stack manipulation overhead
Register allocation with free list reuse
Copy propagation and dead store elimination

Unified Dispatch

All operations go through message sends, even primitives:

Sqlite open: 'db.sqlite'

The compiler emits a SEND. The VM looks up open: on the Sqlite class, finds it's a primitive, and calls the Rust implementation directly - no frame allocation needed.

Performance

No benchmarks here. Try it yourself and feel the speed.

Performance Optimizations

Beyond the core architecture, Lambda Smalltalk employs several targeted optimizations for hot paths:

Regex Compilation Cache

Regular expressions are compiled once and cached:

pub compiled_regex_cache: HashMap<String, Regex>,

When you write:

'hello world' match: /\w+/

The first match compiles the regex and stores it. Subsequent matches reuse the compiled pattern — zero overhead for repeated patterns.

ParseTree-Specific Object Type

Grammar matching produces parse trees. These were originally Dict objects, but profiling showed significant overhead in tree construction and traversal.

Lambda Smalltalk now has a dedicated RegHeapObject::ParseTree:

RegHeapObject::ParseTree {
    rule: String,
    text: String,
    children: Vec<NanValue>,
    captures: HashMap<String, NanValue>,
}

Direct field access (no hash lookups) + optimized primitive methods = 3-5x faster grammar matching.

Precompiled Grammars

Grammars are compiled to internal representation at Grammar from: time:

grammar := Grammar from: '
    expr: @left NUM OP @right NUM
    NUM: /[0-9]+/
    OP: /[+\-*/]/
'.

The parsing happens once. Subsequent Grammar replace:in:with: or findAll:in: operations use the precompiled structure — no re-parsing overhead.

Bounded Loop Optimization

The compiler detects common patterns like 1 to: N do: [:i | ...] and generates specialized bytecode:

Direct integer loops (no boxing/unboxing)
Bounds known at compile time
Loop variable stays in register

This makes iteration as fast as C-style for loops.

Rust Meets Dynamic Language

"Can you implement a garbage collector in Rust?" Yes. Here's how.

The Ownership Problem

Rust's ownership model seems incompatible with garbage collection. Every value must have exactly one owner, but a GC needs to track arbitrary object graphs with cycles and shared references.

Lambda Smalltalk solves this with a clean separation: Rust owns the memory, Smalltalk owns the references.

NaN Boxing: The Great Unifier

Every Smalltalk value fits in 8 bytes. Integers and booleans live directly in registers as immediate values. Objects are represented as 32-bit IDs pointing into a heap vector.

pub struct NanValue(u64);

// Integer: QNAN | SIGN_BIT | 32-bit value (no allocation)
// Object:  QNAN | TAG | 32-bit heap ID (just a number)

When GC asks "is this an object?", it's a single bit check. No traversing pointers. No type tags to decode. One bitwise AND.

The Heap: A Simple Vector

pub heap: Vec<RegHeapObject>

That's it. Rust owns the vector. Smalltalk code only sees indices. When you write Array new, you get back a number — the index into this vector.

This separation is elegant: Rust's borrow checker is happy (it owns the Vec), and Smalltalk's GC is happy (it just shuffles numbers around).

Cheney's Algorithm: Copy the Living

GC uses Cheney-style copying collection:

Walk all registers and frames, collecting root object IDs
For each reachable object, copy it to a new heap
Build a forwarding table: old ID → new ID
Replace all references with new IDs
Drop the old heap

Dead objects? Never touched. The old heap simply disappears. No mark phase, no sweep phase. Just copy and forget.

// GC in action
let mut new_heap = Vec::new();
let mut forwarding = HashMap::new();

// Copy reachable objects
for old_id in worklist {
    let new_id = new_heap.len();
    new_heap.push(self.heap[old_id].clone());
    forwarding.insert(old_id, new_id);
}

// Swap heaps (old heap is dropped here)
self.heap = new_heap;

Default threshold: 16 million objects (~128MB of heap metadata). Why so high? Because GC is copying — every live object gets cloned. With a 50% survival rate, that's 64MB of allocation during GC. Running GC too often would thrash memory bandwidth.

The sweet spot: let the heap grow large, then do one big collection. Most scripts never trigger GC at all.

External Resources: Rc<RefCell>

Files. Network sockets. Database connections. These can't be copied like regular objects.

RegHeapObject::FileStream {
    reader: Option<Rc<RefCell<BufReader<File>>>>,
}

Rc (reference counting) handles the sharing. Multiple Smalltalk objects can reference the same file handle. When the last reference is GC'd, Rust's Drop trait closes the file automatically.

No explicit close needed. No resource leaks. Rust's RAII meets Smalltalk's GC.

Unsafe: Surgical Precision

Yes, there's unsafe. In exactly three places:

Register access in the hot loop — bounds checking on every instruction is too slow
Instruction fetch — get_unchecked() saves nanoseconds that matter
JIT calls — passing raw pointers to generated machine code

Every unsafe block operates on pre-validated indices. The bytecode verifier runs at load time. If verification passes, the indices are guaranteed valid.

// This is safe because:
// 1. ip is bounds-checked by the verifier
// 2. code vector never changes during execution
let inst = unsafe { *code.get_unchecked(ip) };

Rust purists may frown. The benchmark results don't.

Class System

Virtual Metaclasses

Traditional Smalltalk has explicit metaclasses forming a parallel hierarchy. Lambda Smalltalk takes a different approach: virtual metaclasses generated at load time.

pub struct RegClass {
    pub name: u32,                    // Interned symbol ID
    pub superclass: Option<u32>,      // Parent class ID
    pub methods: HashMap<u32, usize>, // Instance methods
    pub class_methods: HashMap<u32, usize>, // Class-side methods
    pub ivar_count: u16,              // Including inherited
    pub metaclass_id: Option<u32>,    // Virtual metaclass
}

When the VM loads a class:

Create the class with its instance methods
Generate a metaclass named "ClassName class"
Migrate class methods to the metaclass as instance methods
Link metaclass inheritance to mirror the class hierarchy

Object         <── Object class
   ↑                    ↑
Collection     <── Collection class
   ↑                    ↑
Array          <── Array class

This means Array class is a real class whose instances are class objects. When you call Array new, the VM sends new to the Array class object, which dispatches through the metaclass.

Instance Variable Layout

Instance variables use indexed storage with inherited variables first:

Person (ivars: name, age)
   ↑
Employee (ivars: salary)

Employee instance layout:
  [0] name    (inherited from Person)
  [1] age     (inherited from Person)
  [2] salary  (own)

The compiler resolves ivar names to indices at compile time. GETIVAR and SETIVAR use 8-bit indices — fast, cache-friendly, no hash lookups.

Blocks and Closures

The Upvalue Cell Pattern

When a block captures an outer variable, Lambda Smalltalk creates a heap-allocated upvalue cell:

RegHeapObject::Upvalue(NanValue)  // Mutable cell

| counter |
counter := 0.
[ counter := counter + 1. counter ]

Here's what happens:

counter is a local variable (register)
When the block is created, counter's value is wrapped in an Upvalue cell
The block stores a reference to this cell
Both the outer scope and the block share the same cell

Multiple blocks can capture the same variable:

| x getter setter |
x := 10.
getter := [ x ].
setter := [:v | x := v ].

getter value.     "→ 10"
setter value: 20.
getter value.     "→ 20"

Both blocks reference the same Upvalue cell. Mutations are visible everywhere.

Block Structure

RegHeapObject::Block {
    method_idx: usize,        // Compiled bytecode
    upvalues: Vec<u32>,       // Heap IDs of captured cells
    home_frame_idx: usize,    // For non-local return
    home_frame_id: u64,       // Detect escaped blocks
    home_receiver: NanValue,  // `self` inside block
}

The home_receiver deserves attention. Inside a block, self refers to the receiver of the enclosing method, not the block itself:

Object subclass: #Counter instanceVariableNames: 'value'.

Counter >> increment
    [ value := value + 1 ] value.  "← `value` is self's ivar"

Non-Local Return

Smalltalk blocks can return from the enclosing method:

findFirst: aBlock in: aCollection
    aCollection do: [:each |
        (aBlock value: each) ifTrue: [ ^ each ]  "← Returns from findFirst:in:"
    ].
    ^ nil

The BLOCKRET opcode handles this by unwinding to home_frame_id. If that frame has already returned (the block escaped), it's an error:

makeBlock
    ^ [ ^ 42 ]  "Block escapes"

makeBlock value  "Error: non-local return from dead frame"

Symbol Table

Every method name, class name, and symbolic value is interned exactly once:

pub struct SymbolTable {
    lookup: HashMap<String, SymbolId>,  // String → ID
    strings: Vec<String>,               // ID → String
}

The first time you use :foo, the symbol table:

Checks if "foo" exists (O(1) hash lookup)
If not, assigns the next ID and stores the string
Returns the 32-bit SymbolId

From then on, comparing :foo = :foo is a single integer comparison.

Fast Hashing

Symbol interning uses FxHash-style hashing for speed:

fn hash(&mut self, bytes: &[u8]) {
    for &byte in bytes {
        self.0 = (self.0.rotate_left(5) ^ (byte as u64))
            .wrapping_mul(0x517cc1b727220a95);
    }
}

This matters because method dispatch happens constantly. Every SEND needs to look up the method by selector. With interned symbols, that lookup is:

Hash(class_id, selector_id) → cache entry → method index

No string comparison in the hot path.

Method Lookup and doesNotUnderstand:

The Lookup Algorithm

When you send a message:

array add: 42

The VM walks the inheritance chain:

fn lookup_method(&self, class_id: u32, selector: u32) -> Option<usize> {
    let mut current = Some(class_id);
    while let Some(cid) = current {
        if let Some(&idx) = self.classes[cid].methods.get(&selector) {
            return Some(idx);  // Found it
        }
        current = self.classes[cid].superclass;  // Try parent
    }
    None  // Not found anywhere
}

Class methods use a similar algorithm but search the metaclass hierarchy.

doesNotUnderstand: — The Ultimate Fallback

When lookup fails, the VM doesn't panic. It sends doesNotUnderstand: to the same receiver:

fn try_does_not_understand(&mut self, class_id: u32, selector: u32, ...) {
    // Create a Message object with selector and arguments
    let message = self.create_message_object(selector, args, receiver);

    // Look up doesNotUnderstand: on the same class
    let dnu_method = self.lookup_method(class_id, self.dnu_selector_id)?;

    // Call it with the Message
    Some((dnu_method, message))
}

This enables powerful patterns:

Object >> doesNotUnderstand: aMessage
    "Forward to a delegate, log, or raise an error"
    Transcript show: 'Unknown: ', aMessage selector.
    ^ nil

The Message object contains everything:

selector — the method name that wasn't found
arguments — the values passed
receiver — who received the message

Exception Handling

Handler Stack Architecture

Exception handlers form a stack separate from the call stack:

pub struct ExceptionHandler {
    frame_idx: usize,           // Which frame installed this
    handler_block: NanValue,    // The rescue block
    exception_class: u32,       // Filter (0 = catch all)
    resume_ip: usize,           // Where to continue after
}

When you write:

[ self riskyOperation ]
    on: Error
    do: [:exc | self handleError: exc ]

The compiler generates:

PUSHHANDLER  (install handler)
  ... risky code ...
POPHANDLER   (normal exit, remove handler)
JUMP past-handler
handler-code:
  ... handle error ...

Stack Unwinding

When signal is called:

Search handlers from top of stack
Check exception class — does it match the filter?
Unwind frames — pop call frames back to the handler's frame
Execute handler block with the exception object
Resume at resume_ip

Error signal: 'Something went wrong'

// Simplified unwinding
while let Some(handler) = self.exception_handlers.last() {
    if handler.matches(exception_class) {
        self.unwind_to(handler.frame_idx);
        return self.call_block(handler.handler_block, exception);
    }
    self.exception_handlers.pop();  // Not this one, try outer
}
// No handler found — crash with stack trace

ensure: — Always Runs

The ensure: pattern guarantees cleanup:

file := File open: 'data.txt'.
[ self process: file ]
    ensure: [ file close ]

Even if process: signals an exception, the file gets closed. The VM maintains a separate ensure_handlers stack:

pub struct EnsureHandler {
    frame_idx: usize,
    ensure_block: NanValue,
}

During unwinding, ensure blocks execute in LIFO order before the exception continues propagating.

The Parser

Hand-Written, Not Generated

Lambda Smalltalk uses a hand-written recursive descent parser — no PEG, no parser generator. This gives precise control over Smalltalk's unique grammar.

The Precedence Rule

Smalltalk has exactly three precedence levels:

unary > binary > keyword

That's it. No operator precedence table. No parentheses needed for clarity.

2 + 3 * 4           "→ 20, not 14"
array at: 1 + 2     "→ array at: 3"

The parser implements this with three mutually recursive functions:

fn parse_keyword(&mut self) -> Expr {
    let mut expr = self.parse_binary()?;  // Binary first
    // Then collect keyword parts
}

fn parse_binary(&mut self) -> Expr {
    let mut expr = self.parse_unary()?;   // Unary first
    // Then collect binary operators (left to right)
}

fn parse_unary(&mut self) -> Expr {
    let mut expr = self.parse_primary()?; // Literals, variables
    // Then collect unary messages
}

Cascade: Multiple Messages, One Receiver

array
    add: 1;
    add: 2;
    add: 3

The semicolon means "send another message to the same receiver." The parser desugars this to:

Cascade {
    receiver: array,
    messages: [
        (add:, [1]),
        (add:, [2]),
        (add:, [3]),
    ]
}

Each message returns its result, but cascade returns the original receiver.

Lambda Smalltalk Extensions

A few deviations from classic Smalltalk:

Mid-method temporaries:

x := 10.
| y |      "← Allowed here"
y := 20.

Standard Smalltalk requires all temporaries at the method start. Lambda Smalltalk allows them anywhere, initializing to nil.

Implicit variable declaration:

x := 10.   "← No | x | needed"

Like Python or Ruby, first assignment creates the variable. Explicit declarations are optional.

Implementation Stats

~80 opcodes (JIT-optimized core operations)
~200 primitive operations
43 built-in classes
Single-pass bytecode compiler with 4 optimization passes