VM Internals
Why Lambda C Is Safe for Embedded Systems
This page provides a deep technical analysis of Lambda C's VM architecture, explaining why it is uniquely suited for safety-critical embedded systems.
Non-Recursive Dispatch Loop
The Core Safety Guarantee
Critical Understanding: Lambda C's VM does not use C-level function recursion for script function calls. This is the fundamental difference from naive interpreters.
How Script Calls Work
When a script function calls another function (including recursive calls), the VM:
- Pushes frame information onto the VM stack - not the C stack
- Updates the program counter to jump to the target function
- Continues the dispatch loop - never calls another C function
This means:
- A script executing
fib(1000)will consume VM stack memory, not C stack - The C stack depth remains constant regardless of script recursion depth
- No risk of
HardFaultor stack overflow on the host MCU
Contrast with Traditional Interpreters
Naive Interpreter (Unsafe):
- Script function call → C function call
- Deep recursion → C stack overflow
- Unpredictable memory consumption
Lambda C (Safe):
- Script function call → VM stack manipulation + jump
- Deep recursion → VM stack consumption only
- Deterministic memory consumption within
STACKSIZE
Commercial Significance
This architecture enables:
- RTOS compatibility - Run VM in tasks with minimal C stack allocation (e.g., 1-2KB)
- Predictable worst-case memory - Memory consumption =
vm_stack_size + vm_garea_size + vm_heap_size - No hidden stack consumption - Unlike interpreters that recursively call C functions
Load-Time Linking: O(1) FFI Dispatch
The Problem with Runtime Symbol Resolution
Traditional scripting languages (Lua, Python) resolve function names every time they are called:
- String hash computation
- Hash table lookup
- Cache misses on long symbol names
This creates variable, unpredictable FFI call overhead.
Lambda C's Solution
FFI linking happens once at bytecode load time:
- Registration phase: Host registers functions with string names
- Link phase: Bytecode loader converts all function name references to integer IDs
- Execution phase: FFI calls use integer ID → direct table lookup
Result: O(1) constant-time FFI dispatch (~100ns) regardless of:
- Function name length
- Number of registered functions
- Call frequency
Why This Matters
In control loops executing at 100Hz-10kHz:
- Predictable timing - No variable overhead from string lookups
- Real-time capable - Consistent FFI call latency
- Cache-friendly - Integer table lookup vs. string hashing
Type Stack: Why Runtime Type Tracking?
The Compact Bytecode Trade-off
Lambda C's 4-byte instruction format achieves extreme compactness by not encoding type information in instructions. An ADD instruction doesn't specify if it's adding int or double.
The Type Stack Solution
The VM maintains a parallel type stack that tracks the runtime type of each value on the VM stack:
inttypes enable fast integer-only pathsdoubletypes trigger floating-point operationspointertypes enforce sandbox restrictions
Key Insight: This is a space-time trade-off optimized for embedded:
- Space saved: 4-byte instructions instead of 8-byte instructions with type tags
- Time cost: Type stack lookup (minimal compared to instruction fetch)
Performance Optimization: Fast Integer Mode
When the VM detects integer-only operations, it can:
- Skip type tag checks
- Use CPU integer ALU directly
- Avoid unnecessary floating-point conversions
This is why fib(25) runs fast despite being an interpreter - the hot loop is all integers.
Static Type Mode (-Os)
Compile with the -Os option to completely omit type information from bytecode, achieving ~2x speedup:
lcvmc -Os -oc output.lcbc source.c
How It Works:
- The compiler analyzes all types at compile time
- Type stack operations are completely skipped
- Integer arithmetic executes at maximum speed
Benchmark (fib(25)):
| Mode | Execution Time |
|---|---|
| Normal mode | 27ms |
| Static type mode | 13ms |
Limitations:
- Optimized for integer operations (normal mode recommended for floating-point)
- printf/sprintf supported; configure
LCVM_PRINTF_BUFSIZEfor embedded (default: 1024 bytes)
Arena Allocator: No free() by Design
Why No free()?
Traditional malloc/free causes:
- Heap fragmentation - Memory holes after repeated alloc/free cycles
- Non-deterministic allocation time - Free-list traversal overhead
- Unpredictable failure - Allocation may fail even with sufficient total free memory
Arena Allocator Characteristics
Lambda C's allocator:
- Allocates sequentially from a contiguous buffer
- Never frees individual allocations - only bulk reset
- O(1) allocation time - Bump pointer, no free-list search
- Zero fragmentation - No memory holes
Embedded Use Case Fit
This design is perfect for periodic control loops:
Control loop iteration:
1. Reset heap → Start with fresh memory
2. Allocate temporary buffers → O(1) fast
3. Process sensor data → No fragmentation risk
4. Output control signals → Predictable memory
5. Next iteration → Heap reset, repeat
Trade-off: Must call lcvm_heap_reset() periodically. This is acceptable because:
- Embedded control loops naturally have iteration boundaries
- Reset cost is negligible compared to loop period (e.g., 10ms)
- Eliminates all fragmentation and leak risks
Heap Watermark API
For scenarios where full heap reset is too coarse, Lambda C provides partial release capability:
size_t mark = lcvm_heap_mark(); // Record current position
char *buf = lcvm_malloc(64*1024); // Temporary allocation
// ... processing ...
lcvm_heap_release(mark); // Batch release to mark position
Characteristics:
- O(1) release time - Simply reset the pointer
- No fragmentation - Sequential release
- Nestable - Multiple marks can be set and released in LIFO order
Use Cases:
- Temporary buffer allocation during sensor data processing
- Per-frame memory in game loops
- Transaction-based processing (allocate → process → release)
4-Byte Instructions: Cache Efficiency
Why Fixed-Length Matters
Variable-length instruction sets (x86, Lua bytecode):
- Unpredictable instruction fetch time
- Poor cache utilization
- Complex branch prediction
Fixed 4-byte instructions:
- Single memory fetch per instruction - Always 4 bytes
- Cache-line friendly - 8 instructions per 32-byte cache line (typical ARM Cortex-M)
- Predictable loop timing - No variable-length decode overhead
Compact Yet Complete
4 bytes provide:
- 8-bit opcode (256 instruction types)
- 24 bits for operands (3x 8-bit or combined addressing modes)
This is sufficient for:
- All arithmetic operations
- Memory access patterns
- Control flow (jumps, calls, returns)
- FFI dispatch
Design principle: Instruction density > raw decode speed for embedded systems where ROM/RAM are scarce but CPU cycles are adequate.
Deterministic Memory Consumption
The Safety-Critical Requirement
In commercial embedded systems, you must prove worst-case memory consumption. Lambda C enables this:
Total Memory = vm_stack_size + vm_garea_size + vm_heap_size
Proof of Bounds
- VM Stack: Maximum depth determined by
STACKSIZE- stack overflow detection prevents exceeding this - Global Area: Fixed at load time based on bytecode global variable declarations
- Heap: Arena allocator bounded by
heap_size- allocation fails cleanly when exhausted
No hidden allocations:
- No GC heap growth
- No implicit conversions allocating temporary buffers
- No runtime compilation generating code
Certification Advantage
This determinism is valuable for:
- Medical devices - FDA requires worst-case memory analysis
- Automotive - ISO 26262 ASIL safety levels
- Industrial control - IEC 61508 SIL requirements
Comparison: Why Lambda C Differs from Lua/Python
Architectural Philosophy
| Aspect | Lambda C | Lua | MicroPython |
|---|---|---|---|
| C stack usage | Constant | Grows with call depth | Grows with call depth |
| FFI dispatch | O(1) table lookup | O(1) hash table (after cache) | String comparison |
| Memory allocation | Arena (no free) | GC with mark-sweep | GC with mark-sweep |
| Instruction format | 4-byte fixed | Variable (1-4 bytes) | Variable (1-3 bytes) |
| Type tracking | Type stack | Per-value tags | PyObject headers |
Design for Embedded
Lambda C makes trade-offs optimized for determinism and safety:
- Predictable over flexible
- Bounded over dynamic
- Simple over expressive
This is why it's suitable for safety-critical embedded systems where Lua/Python are not.
Summary: Technical Evaluation
S-Grade: Architecture
Non-recursive VM loop: Complete separation of script call stack from C stack. Safe for RTOS tasks with minimal stack allocation.
Load-time linking: O(1) FFI dispatch eliminates runtime symbol resolution overhead. Enables real-time control applications.
S-Grade: Memory Efficiency
4-byte instruction format: Achieves extreme ROM efficiency while maintaining cache-friendly fixed-length design.
Arena allocator: Zero fragmentation, O(1) allocation, deterministic memory consumption. Ideal for periodic control loops.
S-Grade: Commercial Suitability
Deterministic memory bounds: Total memory consumption provable at compile time. Enables safety certification.
AOT compilation: Parse complexity isolated to development environment. Runtime is simple, bounded, and verifiable.
Next Steps
- Architecture Overview - High-level system design
- Best Practices - Production deployment patterns
- Documentation - API reference and usage guide