Architecture Overview
Lambda Smalltalk combines multiple optimization techniques to achieve performance competitive with LuaJIT and Node.js V8.
Cranelift JIT
Hot methods are automatically compiled to native code using Cranelift, the same backend used by Wasmtime and Rust's alternative codegen.
- Hot method detection: Methods called 10+ times trigger JIT compilation
- Cross-platform: x86-64, ARM64, RISC-V
- Speculative execution: Overflow checks with fast-path optimization
Fast Path (no overflow):
sadd_overflow a, b -> (result, overflow_flag)
brif overflow_flag -> deopt
tag result as SmallInteger
continue
Deoptimization (overflow):
save all registers to frame
return DEOPT sentinel with resume IP
Branch prediction makes overflow checks nearly free while preserving Smalltalk's unlimited precision integer semantics.
True Deoptimization
When JIT-compiled code encounters overflow or non-SmallInteger arguments, it performs true deoptimization (LuaJIT-style):
- Save register state to the current frame
- Record resume IP — the exact bytecode instruction where overflow occurred
- Return DEOPT sentinel to the VM
- Interpreter resumes at the saved IP with full LargeInteger support
This is different from naive "restart the method" approaches. Lambda Smalltalk resumes execution at the exact point where JIT failed, preserving all intermediate results.
Example: x := (2^31 - 1) + 1
JIT executes: LOADI r0, 2147483647
ADDI r0, r0, 1 ← overflow detected here
[DEOPT: save r0, resume_ip=1]
Interpreter resumes at ip=1:
ADDI r0, r0, 1 ← re-executes with LargeInt
RETURN r0 → LargeInteger(2147483648)
Deoptimization is rare in practice (most code uses SmallInts), but when it happens, it's precise and efficient.
NaN Boxing
All values fit in 8 bytes using NaN boxing:
| Type | Representation |
|---|---|
| Integer | Lower 32 bits (SmallInt) or heap object (LargeInt) |
| Float | Standard IEEE 754 double |
| Object | NaN with object ID in payload |
| Symbol | NaN with symbol ID in payload |
| Bool/Nil | Special NaN patterns |
No type tags, no pointer chasing for primitives. Arithmetic operates directly on machine integers.
Inline Caching + PIC
Method dispatch uses a three-level caching system:
- Monomorphic Fast Path: Single class ID check + direct jump (added Jan 2026)
- Inline Cache (256 entries): Direct-mapped cache for call sites
- Polymorphic Inline Cache (4 entries per site): Handles polymorphic dispatch
SEND execution:
1. Check if receiver class == cached class (monomorphic fast path)
→ Direct jump to method (1 comparison, 1 branch)
2. Check inline cache (hit = direct jump)
3. Check PIC (hit = 4-way lookup)
4. Full method lookup (miss)
5. Update cache
The monomorphic fast path optimizes the most common case: same class at the same call site. Benchmarks show 15-20% improvement for method-heavy code (e.g., collection iteration).
Register-Based VM
32-bit fixed-length instructions with 256 registers per frame:
Format A: op(8) + rd(8) + ra(8) + rb(8) [3 registers]
Format B: op(8) + rd(8) + ra(8) + imm8 [2 registers + immediate]
Format C: op(8) + rd(8) + imm16 [1 register + immediate]
- No stack manipulation overhead
- Register allocation with free list reuse
- Copy propagation and dead store elimination
Unified Dispatch
All operations go through message sends, even primitives:
Sqlite open: 'db.sqlite'
The compiler emits a SEND. The VM looks up open: on the Sqlite class, finds it's a primitive, and calls the Rust implementation directly - no frame allocation needed.
Performance
No benchmarks here. Try it yourself and feel the speed.
Performance Optimizations
Beyond the core architecture, Lambda Smalltalk employs several targeted optimizations for hot paths:
Regex Compilation Cache
Regular expressions are compiled once and cached:
pub compiled_regex_cache: HashMap<String, Regex>,
When you write:
'hello world' match: /\w+/
The first match compiles the regex and stores it. Subsequent matches reuse the compiled pattern — zero overhead for repeated patterns.
ParseTree-Specific Object Type
Grammar matching produces parse trees. These were originally Dict objects, but profiling showed significant overhead in tree construction and traversal.
Lambda Smalltalk now has a dedicated RegHeapObject::ParseTree:
RegHeapObject::ParseTree {
rule: String,
text: String,
children: Vec<NanValue>,
captures: HashMap<String, NanValue>,
}
Direct field access (no hash lookups) + optimized primitive methods = 3-5x faster grammar matching.
Precompiled Grammars
Grammars are compiled to internal representation at Grammar from: time:
grammar := Grammar from: '
expr: @left NUM OP @right NUM
NUM: /[0-9]+/
OP: /[+\-*/]/
'.
The parsing happens once. Subsequent Grammar replace:in:with: or findAll:in: operations use the precompiled structure — no re-parsing overhead.
Bounded Loop Optimization
The compiler detects common patterns like 1 to: N do: [:i | ...] and generates specialized bytecode:
- Direct integer loops (no boxing/unboxing)
- Bounds known at compile time
- Loop variable stays in register
This makes iteration as fast as C-style for loops.
Rust Meets Dynamic Language
"Can you implement a garbage collector in Rust?" Yes. Here's how.
The Ownership Problem
Rust's ownership model seems incompatible with garbage collection. Every value must have exactly one owner, but a GC needs to track arbitrary object graphs with cycles and shared references.
Lambda Smalltalk solves this with a clean separation: Rust owns the memory, Smalltalk owns the references.
NaN Boxing: The Great Unifier
Every Smalltalk value fits in 8 bytes. Integers and booleans live directly in registers as immediate values. Objects are represented as 32-bit IDs pointing into a heap vector.
pub struct NanValue(u64);
// Integer: QNAN | SIGN_BIT | 32-bit value (no allocation)
// Object: QNAN | TAG | 32-bit heap ID (just a number)
When GC asks "is this an object?", it's a single bit check. No traversing pointers. No type tags to decode. One bitwise AND.
The Heap: A Simple Vector
pub heap: Vec<RegHeapObject>
That's it. Rust owns the vector. Smalltalk code only sees indices. When you write Array new, you get back a number — the index into this vector.
This separation is elegant: Rust's borrow checker is happy (it owns the Vec), and Smalltalk's GC is happy (it just shuffles numbers around).
Cheney's Algorithm: Copy the Living
GC uses Cheney-style copying collection:
- Walk all registers and frames, collecting root object IDs
- For each reachable object, copy it to a new heap
- Build a forwarding table: old ID → new ID
- Replace all references with new IDs
- Drop the old heap
Dead objects? Never touched. The old heap simply disappears. No mark phase, no sweep phase. Just copy and forget.
// GC in action
let mut new_heap = Vec::new();
let mut forwarding = HashMap::new();
// Copy reachable objects
for old_id in worklist {
let new_id = new_heap.len();
new_heap.push(self.heap[old_id].clone());
forwarding.insert(old_id, new_id);
}
// Swap heaps (old heap is dropped here)
self.heap = new_heap;
Default threshold: 16 million objects (~128MB of heap metadata). Why so high? Because GC is copying — every live object gets cloned. With a 50% survival rate, that's 64MB of allocation during GC. Running GC too often would thrash memory bandwidth.
The sweet spot: let the heap grow large, then do one big collection. Most scripts never trigger GC at all.
External Resources: Rc<RefCell>
Files. Network sockets. Database connections. These can't be copied like regular objects.
RegHeapObject::FileStream {
reader: Option<Rc<RefCell<BufReader<File>>>>,
}
Rc (reference counting) handles the sharing. Multiple Smalltalk objects can reference the same file handle. When the last reference is GC'd, Rust's Drop trait closes the file automatically.
No explicit close needed. No resource leaks. Rust's RAII meets Smalltalk's GC.
Unsafe: Surgical Precision
Yes, there's unsafe. In exactly three places:
- Register access in the hot loop — bounds checking on every instruction is too slow
- Instruction fetch —
get_unchecked()saves nanoseconds that matter - JIT calls — passing raw pointers to generated machine code
Every unsafe block operates on pre-validated indices. The bytecode verifier runs at load time. If verification passes, the indices are guaranteed valid.
// This is safe because:
// 1. ip is bounds-checked by the verifier
// 2. code vector never changes during execution
let inst = unsafe { *code.get_unchecked(ip) };
Rust purists may frown. The benchmark results don't.
Class System
Virtual Metaclasses
Traditional Smalltalk has explicit metaclasses forming a parallel hierarchy. Lambda Smalltalk takes a different approach: virtual metaclasses generated at load time.
pub struct RegClass {
pub name: u32, // Interned symbol ID
pub superclass: Option<u32>, // Parent class ID
pub methods: HashMap<u32, usize>, // Instance methods
pub class_methods: HashMap<u32, usize>, // Class-side methods
pub ivar_count: u16, // Including inherited
pub metaclass_id: Option<u32>, // Virtual metaclass
}
When the VM loads a class:
- Create the class with its instance methods
- Generate a metaclass named
"ClassName class" - Migrate class methods to the metaclass as instance methods
- Link metaclass inheritance to mirror the class hierarchy
Object <── Object class
↑ ↑
Collection <── Collection class
↑ ↑
Array <── Array class
This means Array class is a real class whose instances are class objects. When you call Array new, the VM sends new to the Array class object, which dispatches through the metaclass.
Instance Variable Layout
Instance variables use indexed storage with inherited variables first:
Person (ivars: name, age)
↑
Employee (ivars: salary)
Employee instance layout:
[0] name (inherited from Person)
[1] age (inherited from Person)
[2] salary (own)
The compiler resolves ivar names to indices at compile time. GETIVAR and SETIVAR use 8-bit indices — fast, cache-friendly, no hash lookups.
Blocks and Closures
The Upvalue Cell Pattern
When a block captures an outer variable, Lambda Smalltalk creates a heap-allocated upvalue cell:
RegHeapObject::Upvalue(NanValue) // Mutable cell
| counter |
counter := 0.
[ counter := counter + 1. counter ]
Here's what happens:
counteris a local variable (register)- When the block is created,
counter's value is wrapped in anUpvaluecell - The block stores a reference to this cell
- Both the outer scope and the block share the same cell
Multiple blocks can capture the same variable:
| x getter setter |
x := 10.
getter := [ x ].
setter := [:v | x := v ].
getter value. "→ 10"
setter value: 20.
getter value. "→ 20"
Both blocks reference the same Upvalue cell. Mutations are visible everywhere.
Block Structure
RegHeapObject::Block {
method_idx: usize, // Compiled bytecode
upvalues: Vec<u32>, // Heap IDs of captured cells
home_frame_idx: usize, // For non-local return
home_frame_id: u64, // Detect escaped blocks
home_receiver: NanValue, // `self` inside block
}
The home_receiver deserves attention. Inside a block, self refers to the receiver of the enclosing method, not the block itself:
Object subclass: #Counter instanceVariableNames: 'value'.
Counter >> increment
[ value := value + 1 ] value. "← `value` is self's ivar"
Non-Local Return
Smalltalk blocks can return from the enclosing method:
findFirst: aBlock in: aCollection
aCollection do: [:each |
(aBlock value: each) ifTrue: [ ^ each ] "← Returns from findFirst:in:"
].
^ nil
The BLOCKRET opcode handles this by unwinding to home_frame_id. If that frame has already returned (the block escaped), it's an error:
makeBlock
^ [ ^ 42 ] "Block escapes"
makeBlock value "Error: non-local return from dead frame"
Symbol Table
Every method name, class name, and symbolic value is interned exactly once:
pub struct SymbolTable {
lookup: HashMap<String, SymbolId>, // String → ID
strings: Vec<String>, // ID → String
}
The first time you use :foo, the symbol table:
- Checks if "foo" exists (O(1) hash lookup)
- If not, assigns the next ID and stores the string
- Returns the 32-bit
SymbolId
From then on, comparing :foo = :foo is a single integer comparison.
Fast Hashing
Symbol interning uses FxHash-style hashing for speed:
fn hash(&mut self, bytes: &[u8]) {
for &byte in bytes {
self.0 = (self.0.rotate_left(5) ^ (byte as u64))
.wrapping_mul(0x517cc1b727220a95);
}
}
This matters because method dispatch happens constantly. Every SEND needs to look up the method by selector. With interned symbols, that lookup is:
Hash(class_id, selector_id) → cache entry → method index
No string comparison in the hot path.
Method Lookup and doesNotUnderstand:
The Lookup Algorithm
When you send a message:
array add: 42
The VM walks the inheritance chain:
fn lookup_method(&self, class_id: u32, selector: u32) -> Option<usize> {
let mut current = Some(class_id);
while let Some(cid) = current {
if let Some(&idx) = self.classes[cid].methods.get(&selector) {
return Some(idx); // Found it
}
current = self.classes[cid].superclass; // Try parent
}
None // Not found anywhere
}
Class methods use a similar algorithm but search the metaclass hierarchy.
doesNotUnderstand: — The Ultimate Fallback
When lookup fails, the VM doesn't panic. It sends doesNotUnderstand: to the same receiver:
fn try_does_not_understand(&mut self, class_id: u32, selector: u32, ...) {
// Create a Message object with selector and arguments
let message = self.create_message_object(selector, args, receiver);
// Look up doesNotUnderstand: on the same class
let dnu_method = self.lookup_method(class_id, self.dnu_selector_id)?;
// Call it with the Message
Some((dnu_method, message))
}
This enables powerful patterns:
Object >> doesNotUnderstand: aMessage
"Forward to a delegate, log, or raise an error"
Transcript show: 'Unknown: ', aMessage selector.
^ nil
The Message object contains everything:
selector— the method name that wasn't foundarguments— the values passedreceiver— who received the message
Exception Handling
Handler Stack Architecture
Exception handlers form a stack separate from the call stack:
pub struct ExceptionHandler {
frame_idx: usize, // Which frame installed this
handler_block: NanValue, // The rescue block
exception_class: u32, // Filter (0 = catch all)
resume_ip: usize, // Where to continue after
}
When you write:
[ self riskyOperation ]
on: Error
do: [:exc | self handleError: exc ]
The compiler generates:
PUSHHANDLER (install handler)
... risky code ...
POPHANDLER (normal exit, remove handler)
JUMP past-handler
handler-code:
... handle error ...
Stack Unwinding
When signal is called:
- Search handlers from top of stack
- Check exception class — does it match the filter?
- Unwind frames — pop call frames back to the handler's frame
- Execute handler block with the exception object
- Resume at
resume_ip
Error signal: 'Something went wrong'
// Simplified unwinding
while let Some(handler) = self.exception_handlers.last() {
if handler.matches(exception_class) {
self.unwind_to(handler.frame_idx);
return self.call_block(handler.handler_block, exception);
}
self.exception_handlers.pop(); // Not this one, try outer
}
// No handler found — crash with stack trace
ensure: — Always Runs
The ensure: pattern guarantees cleanup:
file := File open: 'data.txt'.
[ self process: file ]
ensure: [ file close ]
Even if process: signals an exception, the file gets closed. The VM maintains a separate ensure_handlers stack:
pub struct EnsureHandler {
frame_idx: usize,
ensure_block: NanValue,
}
During unwinding, ensure blocks execute in LIFO order before the exception continues propagating.
The Parser
Hand-Written, Not Generated
Lambda Smalltalk uses a hand-written recursive descent parser — no PEG, no parser generator. This gives precise control over Smalltalk's unique grammar.
The Precedence Rule
Smalltalk has exactly three precedence levels:
unary > binary > keyword
That's it. No operator precedence table. No parentheses needed for clarity.
2 + 3 * 4 "→ 20, not 14"
array at: 1 + 2 "→ array at: 3"
The parser implements this with three mutually recursive functions:
fn parse_keyword(&mut self) -> Expr {
let mut expr = self.parse_binary()?; // Binary first
// Then collect keyword parts
}
fn parse_binary(&mut self) -> Expr {
let mut expr = self.parse_unary()?; // Unary first
// Then collect binary operators (left to right)
}
fn parse_unary(&mut self) -> Expr {
let mut expr = self.parse_primary()?; // Literals, variables
// Then collect unary messages
}
Cascade: Multiple Messages, One Receiver
array
add: 1;
add: 2;
add: 3
The semicolon means "send another message to the same receiver." The parser desugars this to:
Cascade {
receiver: array,
messages: [
(add:, [1]),
(add:, [2]),
(add:, [3]),
]
}
Each message returns its result, but cascade returns the original receiver.
Lambda Smalltalk Extensions
A few deviations from classic Smalltalk:
Mid-method temporaries:
x := 10.
| y | "← Allowed here"
y := 20.
Standard Smalltalk requires all temporaries at the method start. Lambda Smalltalk allows them anywhere, initializing to nil.
Implicit variable declaration:
x := 10. "← No | x | needed"
Like Python or Ruby, first assignment creates the variable. Explicit declarations are optional.
Implementation Stats
- ~80 opcodes (JIT-optimized core operations)
- ~200 primitive operations
- 43 built-in classes
- Single-pass bytecode compiler with 4 optimization passes