VM Internals

Why Lambda C Is Safe for Embedded Systems

This page provides a deep technical analysis of Lambda C's VM architecture, explaining why it is uniquely suited for safety-critical embedded systems.

Non-Recursive Dispatch Loop

The Core Safety Guarantee

Critical Understanding: Lambda C's VM does not use C-level function recursion for script function calls. This is the fundamental difference from naive interpreters.

How Script Calls Work

When a script function calls another function (including recursive calls), the VM:

Pushes frame information onto the VM stack - not the C stack
Updates the program counter to jump to the target function
Continues the dispatch loop - never calls another C function

This means:

A script executing fib(1000) will consume VM stack memory, not C stack
The C stack depth remains constant regardless of script recursion depth
No risk of HardFault or stack overflow on the host MCU

Contrast with Traditional Interpreters

Naive Interpreter (Unsafe):

Script function call → C function call
Deep recursion → C stack overflow
Unpredictable memory consumption

Lambda C (Safe):

Script function call → VM stack manipulation + jump
Deep recursion → VM stack consumption only
Deterministic memory consumption within STACKSIZE

Commercial Significance

This architecture enables:

RTOS compatibility - Run VM in tasks with minimal C stack allocation (e.g., 1-2KB)
Predictable worst-case memory - Memory consumption = vm_stack_size + vm_garea_size + vm_heap_size
No hidden stack consumption - Unlike interpreters that recursively call C functions

Load-Time Linking: O(1) FFI Dispatch

The Problem with Runtime Symbol Resolution

Traditional scripting languages (Lua, Python) resolve function names every time they are called:

String hash computation
Hash table lookup
Cache misses on long symbol names

This creates variable, unpredictable FFI call overhead.

Lambda C's Solution

FFI linking happens once at bytecode load time:

Registration phase: Host registers functions with string names
Link phase: Bytecode loader converts all function name references to integer IDs
Execution phase: FFI calls use integer ID → direct table lookup

Result: O(1) constant-time FFI dispatch (~100ns) regardless of:

Function name length
Number of registered functions
Call frequency

Why This Matters

In control loops executing at 100Hz-10kHz:

Predictable timing - No variable overhead from string lookups
Real-time capable - Consistent FFI call latency
Cache-friendly - Integer table lookup vs. string hashing

Type Stack: Why Runtime Type Tracking?

The Compact Bytecode Trade-off

Lambda C's 4-byte instruction format achieves extreme compactness by not encoding type information in instructions. An ADD instruction doesn't specify if it's adding int or double.

The Type Stack Solution

The VM maintains a parallel type stack that tracks the runtime type of each value on the VM stack:

int types enable fast integer-only paths
double types trigger floating-point operations
pointer types enforce sandbox restrictions

Key Insight: This is a space-time trade-off optimized for embedded:

Space saved: 4-byte instructions instead of 8-byte instructions with type tags
Time cost: Type stack lookup (minimal compared to instruction fetch)

Performance Optimization: Fast Integer Mode

When the VM detects integer-only operations, it can:

Skip type tag checks
Use CPU integer ALU directly
Avoid unnecessary floating-point conversions

This is why fib(25) runs fast despite being an interpreter - the hot loop is all integers.

Static Type Mode (-Os)

Compile with the -Os option to completely omit type information from bytecode, achieving ~2x speedup:

lcvmc -Os -oc output.lcbc source.c

How It Works:

The compiler analyzes all types at compile time
Type stack operations are completely skipped
Integer arithmetic executes at maximum speed

Benchmark (fib(25)):

Mode	Execution Time
Normal mode	27ms
Static type mode	13ms

Limitations:

Optimized for integer operations (normal mode recommended for floating-point)
printf/sprintf supported; configure LCVM_PRINTF_BUFSIZE for embedded (default: 1024 bytes)

Arena Allocator: No `free()` by Design

Why No `free()`?

Traditional malloc/free causes:

Heap fragmentation - Memory holes after repeated alloc/free cycles
Non-deterministic allocation time - Free-list traversal overhead
Unpredictable failure - Allocation may fail even with sufficient total free memory

Arena Allocator Characteristics

Lambda C's allocator:

Allocates sequentially from a contiguous buffer
Never frees individual allocations - only bulk reset
O(1) allocation time - Bump pointer, no free-list search
Zero fragmentation - No memory holes

Embedded Use Case Fit

This design is perfect for periodic control loops:

Control loop iteration:
1. Reset heap → Start with fresh memory
2. Allocate temporary buffers → O(1) fast
3. Process sensor data → No fragmentation risk
4. Output control signals → Predictable memory
5. Next iteration → Heap reset, repeat

Trade-off: Must call lcvm_heap_reset() periodically. This is acceptable because:

Embedded control loops naturally have iteration boundaries
Reset cost is negligible compared to loop period (e.g., 10ms)
Eliminates all fragmentation and leak risks

Heap Watermark API

For scenarios where full heap reset is too coarse, Lambda C provides partial release capability:

size_t mark = lcvm_heap_mark();     // Record current position
char *buf = lcvm_malloc(64*1024);   // Temporary allocation
// ... processing ...
lcvm_heap_release(mark);            // Batch release to mark position

Characteristics:

O(1) release time - Simply reset the pointer
No fragmentation - Sequential release
Nestable - Multiple marks can be set and released in LIFO order

Use Cases:

Temporary buffer allocation during sensor data processing
Per-frame memory in game loops
Transaction-based processing (allocate → process → release)

4-Byte Instructions: Cache Efficiency

Why Fixed-Length Matters

Variable-length instruction sets (x86, Lua bytecode):

Unpredictable instruction fetch time
Poor cache utilization
Complex branch prediction

Fixed 4-byte instructions:

Single memory fetch per instruction - Always 4 bytes
Cache-line friendly - 8 instructions per 32-byte cache line (typical ARM Cortex-M)
Predictable loop timing - No variable-length decode overhead

Compact Yet Complete

4 bytes provide:

8-bit opcode (256 instruction types)
24 bits for operands (3x 8-bit or combined addressing modes)

This is sufficient for:

All arithmetic operations
Memory access patterns
Control flow (jumps, calls, returns)
FFI dispatch

Design principle: Instruction density > raw decode speed for embedded systems where ROM/RAM are scarce but CPU cycles are adequate.

Deterministic Memory Consumption

The Safety-Critical Requirement

In commercial embedded systems, you must prove worst-case memory consumption. Lambda C enables this:

Total Memory = vm_stack_size + vm_garea_size + vm_heap_size

Proof of Bounds

VM Stack: Maximum depth determined by STACKSIZE - stack overflow detection prevents exceeding this
Global Area: Fixed at load time based on bytecode global variable declarations
Heap: Arena allocator bounded by heap_size - allocation fails cleanly when exhausted

No hidden allocations:

No GC heap growth
No implicit conversions allocating temporary buffers
No runtime compilation generating code

Certification Advantage

This determinism is valuable for:

Medical devices - FDA requires worst-case memory analysis
Automotive - ISO 26262 ASIL safety levels
Industrial control - IEC 61508 SIL requirements

Comparison: Why Lambda C Differs from Lua/Python

Architectural Philosophy

Aspect	Lambda C	Lua	MicroPython
C stack usage	Constant	Grows with call depth	Grows with call depth
FFI dispatch	O(1) table lookup	O(1) hash table (after cache)	String comparison
Memory allocation	Arena (no free)	GC with mark-sweep	GC with mark-sweep
Instruction format	4-byte fixed	Variable (1-4 bytes)	Variable (1-3 bytes)
Type tracking	Type stack	Per-value tags	PyObject headers

Design for Embedded

Lambda C makes trade-offs optimized for determinism and safety:

Predictable over flexible
Bounded over dynamic
Simple over expressive

This is why it's suitable for safety-critical embedded systems where Lua/Python are not.

Summary: Technical Evaluation

S-Grade: Architecture

Non-recursive VM loop: Complete separation of script call stack from C stack. Safe for RTOS tasks with minimal stack allocation.

Load-time linking: O(1) FFI dispatch eliminates runtime symbol resolution overhead. Enables real-time control applications.

S-Grade: Memory Efficiency

4-byte instruction format: Achieves extreme ROM efficiency while maintaining cache-friendly fixed-length design.

Arena allocator: Zero fragmentation, O(1) allocation, deterministic memory consumption. Ideal for periodic control loops.

S-Grade: Commercial Suitability

Deterministic memory bounds: Total memory consumption provable at compile time. Enables safety certification.

AOT compilation: Parse complexity isolated to development environment. Runtime is simple, bounded, and verifiable.

Next Steps

Architecture Overview - High-level system design
Best Practices - Production deployment patterns
Documentation - API reference and usage guide