Introduction

One language. Four altitudes.

Most languages pick a level and live there. Python and JavaScript sit high — fast to write, far from the metal. Go sits in the middle — concurrency in the language, a runtime underneath. Rust and C++ sit low — you own memory and layout, and you pay attention to both.

Hale is a single language you can write at any of those levels, and move between them without switching tools. The same file can read like a script at the top and like a systems program at the bottom. There is one primitive — the locus — and the only thing that changes as you descend is how much of it you choose to see.

Try it now: the playground runs real Hale, compiled to WebAssembly, right in your browser — no install.

This guide is built around that idea. It introduces Hale at four levels, each one self-contained:

The basics — variables, math, functions, control flow. Hale as a small, clean language. You can write real scripts knowing only this.
Everyday programs — files, JSON, HTTP, a bit of structure. Hale at the altitude you’d reach for Python or Node.
Concurrent services — long-running processes, a typed message bus, supervision. Hale where you’d reach for Go.
Systems control — memory, layout, lifetime, zero-copy I/O, C bindings. Hale where you’d reach for Rust or C++.

Each level expands on the one before it without contradicting it. The function you wrote in the basics still works in systems control — you’ve just learned to see more of what was always there.

A taste

Here’s a small service. Don’t worry about every keyword yet; notice that each phrase you’d say out loud has a place to live.

type Player    { id: String; name: String; }
type MatchInfo { match_id: String; players: [Player]; }

topic JoinQueue  { payload: Player; }
topic MatchReady { payload: MatchInfo; }

locus Matchmaker {
    params { target_size: Int = 4; }
    bus {
        subscribe JoinQueue as on_join;
        publish   MatchReady;
    }

    fn on_join(p: Player) {
        self.waiting.push(p);
        if self.waiting.len() >= self.target_size {
            MatchReady <- assemble_match(self.waiting, self.target_size);
        }
    }
}

“A matchmaker” → locus Matchmaker. “That receives players” → subscribe JoinQueue. “And announces matches” → publish MatchReady. “When enough are queued” → the if. The code keeps the shape of the sentence.

That’s the bet behind Hale: the gap between how you describe a system and what you type doesn’t have to be there. The design chapter explains why one shape works across the whole range — and across human, LLM, and machine.

How to read this

If you’re new to programming or to systems languages, start at The basics and go in order. If you already program, skim the basics for the parts that differ from what you know (the failure model and the money/time types are worth a look), then jump to the level that matches the program you want to write. Every level after the basics opens with a short “Coming from X?” box to orient you.

When you want the exact rules rather than the tour, the reference points into spec/ — the canonical contract the compiler enforces.

Head to Install to set up the toolchain, then Your first run to put a program on screen.

Install

Get the hale toolchain on your path.

There are two ways to get hale: download a prebuilt binary (quickest), or build from source (for contributors, or a platform without a prebuilt). Either way, read What you need to run programs — hale is a compiler that shells out to a C toolchain, so it has a couple of runtime requirements no matter how you install it.

Quickest: prebuilt binary

Grab the tarball for your platform from the releases page:

Platform	Asset
Linux x86_64 (glibc)	`hale-<version>-x86_64-unknown-linux-gnu.tar.gz`
macOS Apple Silicon	`hale-<version>-aarch64-apple-darwin.tar.gz`

tar -xzf hale-<version>-<triple>.tar.gz
# The archive contains `hale` AND `libhale_ts_shim.a` — keep them
# in the SAME directory: the compiler looks for the shim next to
# its own binary and can't link programs without it.
sudo cp hale libhale_ts_shim.a /usr/local/bin/   # or anywhere on PATH, together
hale --help

The binary is self-contained with respect to LLVM — LLVM 18 is statically linked in, so you do not need to install LLVM to run the compiler. (Intel Macs: run the Apple-Silicon build under Rosetta 2.)

What you need to run programs

Regardless of how you installed hale, compiling a program (hale run / hale build) recompiles and links the runtime on your machine, so you need a C toolchain present:

clang on your PATH (bare or clang-18) — used to assemble and link the emitted native code. lld is additionally needed only if you build with LOTUS_LTO=1 or target wasm32.
OpenSSL shared libraries (libssl / libcrypto) — the standard library’s TLS client links against them unconditionally.

Installing clang pulls in libLLVM as clang’s own dependency — that’s expected and harmless; hale itself doesn’t need it.

Build from source

Requirements:

Rust 1.95 or newer (the compiler is written in Rust).
LLVM 18 development libraries, with llvm-config-18 on your PATH (or LLVM_SYS_180_PREFIX pointing at the install). LLVM 17, 19, and 20 will not link — the backend is pinned to 18.
clang (+ lld for LTO / wasm), OpenSSL headers, and git.

Debian / Ubuntu (LLVM 18 is in stock apt on 24.04+):

sudo apt install llvm-18-dev libpolly-18-dev libzstd-dev \
                 clang-18 libclang-18-dev lld-18 zlib1g-dev \
                 libssl-dev pkg-config git

Fedora

sudo dnf install llvm18-devel clang18 lld openssl-devel git

macOS (Homebrew)

brew install llvm@18 openssl git
export LLVM_SYS_180_PREFIX="$(brew --prefix llvm@18)"

Then:

git clone https://github.com/hale-lang/hale
cd hale
cargo build --release

The hale binary lands at target/release/hale (and libhale_ts_shim.a beside it). Put the binary on your path, or invoke it through Cargo as shown below.

Reproducible / release build

release/docker-compose.yml builds a self-contained Linux tarball in a pinned ubuntu:24.04 + LLVM 18 container, so you don’t have to match the toolchain locally:

docker compose -f release/docker-compose.yml run --rm build
# -> dist/hale-x86_64-unknown-linux-gnu.tar.gz

Platform support

Platform	Status
Linux x86_64 (glibc)	First-class — hosts the compiler and runs compiled programs, all features.
macOS (Apple Silicon)	Supported — hosts the compiler and targets itself. Everything runs except `async_io` pools, which fail at compile time with a clear diagnostic (use a cooperative pool, or build on Linux). Intel Macs run the arm64 build via Rosetta 2.
Windows	No native support (the runtime is POSIX). Use WSL2 (Ubuntu) and follow the Linux instructions.
wasm32	`hale build --target wasm32` for the browser.

Verify

hale --help

Or through Cargo from a source checkout:

cargo run -p hale-cli --bin hale -- --help

To run the compiler’s own test suite (single-threaded avoids “text file busy” flakes from parallel test binaries racing on the same temp path):

cargo test --release --workspace -- --test-threads=1

The two ways to run a program

Both go through the same LLVM-native compiler — there’s no separate interpreter, so they never disagree:

hale run prog.hl — compiles and runs in one step (the binary is temporary). The fast inner loop while you write.
hale build prog.hl — compiles to a native binary on disk via LLVM. This is the artifact you ship.

hale run   prog.hl   # compile + run
hale build prog.hl   # compile to ./prog
./prog

Throughout this guide we write hale run / hale build as if hale is on your path. From a source checkout without it installed, prefix with cargo run -p hale-cli --bin hale --.

Next: Your first run.

Your first run

Put something on screen.

Create a file hello.hl:

fn main() {
    println("Hello from Hale.");
}

Run it:

hale run hello.hl

Hello from Hale.

hale run compiles your program and runs it in one step — it’s the same native code hale build produces, just executed immediately and not left on disk. When you want the artifact to keep and ship, build it:

hale build hello.hl
./hello

Same compiler, same output: run is the fast inner-loop shape, build is for the binary you deploy. There’s no separate interpreter, so anything that runs under build runs identically under run.

What’s here

fn main() is the entry point, the same as it is in C, Go, or Rust. A Hale program starts by calling it.
println(...) prints its arguments followed by a newline. It takes any number of arguments and concatenates them — there’s no format string:
```
fn main() {
    let name = "Hale";
    println("Hello from ", name, ".");
}
```
Statements end with ;. Newlines are just whitespace — they don’t end statements. Source is ASCII outside of string literals and comments.

Comments are C-style:

// a line comment
/* a block comment */

That’s the whole surface you need to start. The next chapter introduces variables and the value types — the vocabulary every Hale program is built from.

hale run and imports. A single file’s import "..." as ...; directives are resolved by hale run just as hale build resolves them. The one gap is the ad-hoc directory form (hale run ./dir), which bundles the directory’s files without cross-seed import resolution — use hale build ./dir for a multi-file project that imports libraries.

Build modes, diagnostics, and debugging

A few switches worth knowing from day one:

Faster iteration: hale build --dev (or HALE_DEV=1) uses a lighter optimization pipeline — noticeably quicker builds while you’re in an edit-run loop. Release builds default to -O3 tuned for your CPU.
Where did the build time go? HALE_TIME=1 hale build app.hl prints per-phase wall times.
Editor integration: hale check app.hl --json emits one JSON object per diagnostic (file, line, col, severity, message) on stdout. hale check itself runs in ~10 ms even on large programs, so a save-hook is all an editor needs.
Real debugging: binaries carry DWARF line tables by default — gdb ./app, break app.hl:42, backtraces with real file:line, and ASAN reports that point at the exact source line. Zero runtime cost; opt out with LOTUS_NO_DEBUGINFO=1.

Build a job queue

In about thirty minutes, you’ll build a small job queue and watch it descend the four altitudes — from a throwaway script to a service split across processes — changing almost nothing but main at the very end. The first three stages run in the browser at the playground (no install); to follow along locally, drop each program in a .hl file and hale run it.

We’ll keep the “work” trivial — squaring a number stands in for whatever a real job does — so the shape of the program stays in focus.

1. A job, and the work

Start with the data and the work, as a plain script. A type is pure data; a fn does something with it.

type Job { id: Int; work: Int; }

fn process(j: Job) -> Int {
    return j.work * j.work;
}

fn main() {
    let j: Job = Job { id: 1, work: 7 };
    println("job ", j.id, " -> ", process(j));
}

job 1 -> 49

This is Hale as a small, clean scripting language — no ceremony, no runtime to think about. One job, processed.

2. A queue that holds the jobs

A queue needs to hold jobs. In Hale a collection is a locus with a @form annotation — no Vec<T> to import or parameterize. @form(vec) synthesizes push, get, pop, len, and is_empty on the locus; get/pop are fallible (out of range), so you address them at the call site with or.

type Job { id: Int; work: Int; }

@form(vec)
locus Queue {
    capacity { heap jobs of Job; }
}

fn process(j: Job) -> Int { return j.work * j.work; }

fn main() {
    let q = Queue { };
    q.push(Job { id: 1, work: 7 });
    q.push(Job { id: 2, work: 3 });
    q.push(Job { id: 3, work: 9 });
    println("queued: ", q.len());

    for j in q.items {
        println("job ", j.id, " -> ", process(j));
    }
}

queued: 3
job 1 -> 49
job 2 -> 9
job 3 -> 81

This is the everyday altitude — loci as plain objects that hold state and expose behavior. Still a single program, run start to finish.

3. Make it a service: the typed bus

A real queue doesn’t drain itself in a loop — work arrives, and workers react. That’s the typed message bus. Declare the channels as topics, and wire loci to them: a Worker subscribes to Jobs, does the work, and publishes a Result; a Reporter subscribes to Results; a Submitter publishes jobs.

type Job    { id: Int; work: Int; }
type Result { id: Int; out: Int; }

topic Jobs    { payload: Job; }
topic Results { payload: Result; }

locus Worker {
    bus {
        subscribe Jobs as on_job;
        publish   Results;
    }
    fn on_job(j: Job) {
        let out: Int = j.work * j.work;
        Results <- Result { id: j.id, out: out };
    }
}

locus Reporter {
    bus { subscribe Results as on_result; }
    fn on_result(r: Result) { println("job ", r.id, " done -> ", r.out); }
}

locus Submitter {
    bus { publish Jobs; }
    birth() {
        Jobs <- Job { id: 1, work: 7 };
        Jobs <- Job { id: 2, work: 3 };
        Jobs <- Job { id: 3, work: 9 };
    }
}

fn main() {
    Worker { };
    Reporter { };
    Submitter { };
}

job 1 done -> 49
job 2 done -> 9
job 3 done -> 81

Run it: this exact program is live in the playground — no install.

Notice what you didn’t write. The Submitter never calls the Worker — it publishes to a topic, and whoever subscribes gets the message. There’s no mutex, no channel type to choose, no async/await colouring a single function. This is the concurrent-services altitude, and the cardinality is emergent: add a second Worker { }; in main and both receive jobs — the topic is many-to-many.

So far the bus has been running in-process (the default transport — an in-memory queue). The loci don’t know or care. That’s the seam we pull on next.

4. Deploy it: change only `main`

The loci above never mention threads or transports. You wire those in main — placement { } says where loci run, and bindings { } says how each topic travels. None of the Worker / Reporter / Submitter code changes; you give them a new main per deployment.

To run the worker as its own process — listening for jobs over a Unix socket, on its own cooperative pool — that’s a main locus:

// worker.hl — the worker as its own binary. Import the shared Job/Result
// types, the Jobs/Results topics, and the Worker/Reporter loci from §3;
// only this `main` is new.
main locus WorkerNode {
    params {
        worker:   Worker   = Worker { };
        reporter: Reporter = Reporter { };
    }
    placement {
        worker: cooperative(pool = jobs);   // its own pool / OS thread
    }
    bindings {
        Jobs: unix("/run/jobs.sock", role: listen);
    }
}

The job source becomes a second binary whose main instantiates the Submitter and binds the same topic with role: connect (Jobs: unix("/run/jobs.sock", role: connect);). Same Jobs topic, same typed payload — now crossing a process boundary instead of an in-memory queue. Swap unix(...) for udp://host:port or a broker adapter and the loci still don’t change; only main does. (Add a codec(...) on the binding to put JSON or protobuf on the wire so a non-Hale peer can read it.)

For the full multi-binary picture — sharing the loci across files, picking transports, and supervising the workers — see Across binaries and Concurrency & placement.

What you built

The same Job / Worker / topic definitions carried you from a script to a distributed service. Each altitude added exactly what it needed and nothing more:

Altitude	What appeared
Script	`type`, `fn` — data and the work
Everyday	a `@form(vec)` locus that holds the jobs
Concurrent	`topic`s + the bus; workers react instead of being called
Systems	`main` chooses placement and transports — the loci untouched

That last row is the point: a Hale program is a design of loci and topics; where and how it runs is a binding you change in one place. From here, the concurrent services chapters go deeper on lifecycle, failure, and supervision — or open the playground and run the bus version in your browser.

Values & variables

The vocabulary every Hale program is built from.

A variable is introduced with let:

let greeting = "hello";
let count    = 3;
let ratio    = 0.5;
let ready    = true;

Hale infers the type from the value. You can write it explicitly when you want to be sure, or when there’s no value to infer from:

let count: Int = 3;

Immutable by default

A plain let binding can’t be reassigned. To make a variable you can change, add mut:

let total = 0;
total = total + 1;        // ERROR: total is immutable

let mut total = 0;
total = total + 1;        // fine

Immutable-by-default is a per-binding property, not a property of the type. There’s no separate “constant” concept for locals — let is the constant, let mut is the variable. (Top-level program constants use const NAME: T = ...; and are written SCREAMING_SNAKE_CASE.)

Shadowing — declaring a second let x in the same scope — is not allowed. Pick a new name. The language would rather you say what you mean than quietly reuse a name for a different value.

The primitive types

These are the scalar types built into the language:

Type	What it holds	Literal examples
`Int`	64-bit signed integer	`0`, `42`, `1_000_000`, `0xFF`, `0b1010`
`Float`	64-bit IEEE float	`3.14`, `1.0e-3`, `2.5`
`Bool`	true / false	`true`, `false`
`String`	UTF-8 text	`"hello"`, `"line\n"`
`Decimal`	exact fixed-point number	`1.50d`, `0.00d`
`Duration`	a span of time	`100ms`, `5s`, `1h30m`
`Time`	a wall-clock instant	`2026-05-08T12:00:00Z`
`Bytes`	a binary blob	`b"\x00\x01\xff"`

Decimal, Duration, and Time are first-class — not strings you parse, not integers you remember the units of. They get their own chapter (Math, money & time) because they’re a real ergonomic upgrade over what most languages give you.

Underscores in number literals are just for readability (1_000_000). Integers default to Int, decimals-with-a-point default to Float; the d suffix makes a Decimal.

Strings

Double-quoted, with the usual escapes (\n, \t, \", \\, \xNN). Three extra forms:

let raw   = r"C:\not\escaped";        // raw — backslashes literal
let multi = """
    spans
    lines
    """;                               // triple-quoted
let name  = "world";
let hi    = f"hello {name}";           // f-string interpolation

An f-string evaluates the expressions inside {...} and renders them into the text. Use {{ and }} for literal braces.

Printing values

println and print take any number of arguments and concatenate them. to_string turns a value into text when you need it as a String:

fn main() {
    let n = 41;
    println("n + 1 = ", n + 1);        // n + 1 = 42
    let s = to_string(n + 1);          // "42"
    println(s);
}

println, print, to_string, and len are builtins — called as plain functions, not methods. You write len(s), not s.len(). (Methods with . come later, on loci and your own types.)

Next: Math, money & time.

Math, money & time

Arithmetic, and three types that save you from classic bugs.

Arithmetic

The operators are what you’d expect:

let a = 7 + 3;       // 10
let b = 7 - 3;       // 4
let c = 7 * 3;       // 21
let d = 7 / 3;       // 2   — integer division
let e = 7 % 3;       // 1   — remainder

Comparison and logic:

let bigger = a > b;          // Bool
let between = a > 0 && a < 100;
let either  = ready || forced;
let negated = !ready;

Bitwise operators (& | ^ << >> ~) are available on Int.

Comparisons don’t chain: a < b < c is a parse error — write a < b && b < c. This is deliberate; chained comparison is a common source of silent bugs.

Int and Float

Int is 64-bit signed; Float is a 64-bit IEEE double. Hale widens Int to Float automatically where it’s unambiguous — at a let with a Float annotation, when passing an Int to a Float parameter, and when one side of an arithmetic or comparison operator is a Float:

let x: Float = 3;        // 3.0 — widened
let y = 2.0 * 3;         // 6.0 — Int 3 promoted to Float

Going the other way loses information, so it’s explicit:

let n = Int(3.9);        // 3 — truncates toward zero

When you’d rather name the conversion — or need it mid-expression where the implicit widening doesn’t reach — std::math has both directions as functions:

let f = std::math::int_to_float(42);     // 42.0
let m = std::math::float_to_int(3.99);   // 3 — round toward zero

They’re the same sitofp / fptosi conversions as the casts, just callable anywhere — so numeric code never has to launder a value through to_string + parse_float to change its type.

When you want a Float rounded to an Int rather than truncated — building an integer field out of a Float quantity, say — reach for round; trunc is the toward-zero sibling:

let a = std::math::round(3.7);          // 4   (Int)
let b = std::math::round(2.5);          // 3   — half away from zero
let c = std::math::round(0.0 - 2.5);    // -3
let d = std::math::trunc(3.7);          // 3   — toward zero, like float_to_int

Both return an Int directly. (floor / ceil below return a Float; wrap them in float_to_int if you need an Int.)

The standard library covers the rest: std::math::sqrt, exp, log, pow, floor, ceil, the trig functions, and so on.

Decimal — exact numbers

Float is wrong for money. 0.1 + 0.2 is not 0.3 in any IEEE-float language, and rounding error compounds. Hale gives you Decimal: a fixed-point type with exact arithmetic. Write the literal with a d suffix.

let price = 19.99d;
let qty   = 3;
let total = price * 3;          // 59.97d — exact, no drift

Use Decimal for prices, balances, quantities, anything where a penny of rounding error is a bug. Use Float for measurements, ratios, and math where approximation is fine. The two never mix implicitly — there is no silent Decimal/Float conversion, so you can’t accidentally launder exactness away.

Duration — time spans with units

A duration is a length of time, written with a unit suffix:

let timeout = 5s;
let frame   = 16ms;
let day      = 24h;
let compound = 1h30m;          // durations add up

No more “is this milliseconds or seconds?” — the unit is part of the literal. Durations do arithmetic and comparison:

let total = timeout + frame;
if elapsed > timeout { /* ... */ }

This is also what the runtime’s sleep takes:

std::time::sleep(100ms);

Time — wall-clock instants

A Time is a specific instant, written as an ISO-8601 literal in backticks:

let launch = `2026-05-08T12:00:00Z`;

For measuring elapsed time, reach for the monotonic clock — it never jumps backward when the wall clock is adjusted:

let start = std::time::monotonic();   // a Duration since boot
do_work();
let took = std::time::monotonic() - start;
println("took ", took);

std::time::now() gives wall-clock seconds since the Unix epoch when you genuinely need calendar time; monotonic() is the basis for anything timing-related.

Why these are in the language

Decimal, Duration, and Time aren’t library types you opt into — they’re primitives with their own literals. The reason is that the bugs they prevent (float drift in money, unit confusion in time) are so common and so costly that making them first-class is worth it. You get the safety without importing anything or remembering a convention.

Next: Functions.

Functions

Naming a piece of work so you can call it.

A function is declared with fn, a name, typed parameters, and an optional return type:

fn add(a: Int, b: Int) -> Int {
    return a + b;
}

fn greet(name: String) {
    println("hello, ", name);
}

add returns an Int. greet has no -> T, so it returns nothing (the unit type, written ()). Parameters are always typed; there’s no inference at the boundary, because the signature is the contract.

Call them the obvious way:

fn main() {
    let sum = add(2, 3);          // 5
    greet("world");
}

Returning a value

return expr; hands a value back. A function can also return its last expression without return if you leave off the trailing ; — the block’s final expression is its value:

fn double(n: Int) -> Int {
    n * 2          // no semicolon — this is the return value
}

Both styles are fine. Use whichever reads better; return is clearer for early exits.

Default parameter values

A parameter can carry a default, so the caller can leave off the trailing arguments:

fn pow(base: Int, exp: Int = 2) -> Int {
    let mut acc = 1;
    for _ in 0..exp { acc = acc * base; }
    return acc;
}

fn main() {
    println(pow(3));      // exp defaults to 2 → 9
    println(pow(2, 5));   // override → 32
}

Two rules keep the calling convention unambiguous:

Defaults form a trailing suffix. A required parameter can’t follow a defaulted one — otherwise it wouldn’t be clear which slot an omitted argument fills.
Defaults are evaluated at the call site, in the caller’s scope — not baked in when the function is defined. For a constant literal (the common case) that’s identical; for an expression that names a caller-visible binding, it sees that binding.

Locus methods support defaults too. One caveat: bus-handler methods and mode methods reject them — their argument shape is fixed by the runtime, so there’s no slot to fill at dispatch time.

Functions are values

A function has a type — fn(Int, Int) -> Int — and you can pass one as an argument. This is how you hand behavior to another function:

fn apply_twice(f: fn(Int) -> Int, x: Int) -> Int {
    return f(f(x));
}

fn inc(n: Int) -> Int { return n + 1; }

fn main() {
    println(apply_twice(inc, 10));    // 12
}

One limit worth knowing now: a function value is just a pointer to a named function. Hale has no closures — no inline |x| x + captured that captures surrounding variables. If a callback needs context, you pass the context in explicitly, or (at higher levels) you reach for a locus that holds the state. This keeps every function value a plain, inspectable thing.

Free functions and where they live

A function declared at the top level of a file is a free function. Every top-level declaration in a directory is visible to every file in that directory — there’s no import between files in the same project, and no pub to mark something exported. You organize by concern, putting related declarations near each other, not by visibility.

// these two can call each other freely, in either file order
fn celsius_to_f(c: Float) -> Float { return c * 9.0 / 5.0 + 32.0; }
fn f_to_celsius(f: Float) -> Float { return (f - 32.0) * 5.0 / 9.0; }

Free functions are the right tool when an operation has no state of its own — a calculation, a conversion, a parser. When a group of them starts to feel like a coherent vocabulary, the Everyday programs level shows how to gather them onto a locus. For now: a free function per piece of work.

Next: Control flow.

Control flow

Choosing, repeating, and matching.

`if` / `else`

if score >= 90 {
    println("A");
} else if score >= 80 {
    println("B");
} else {
    println("C");
}

if is also an expression — it produces a value, so you can assign with it. In expression position it needs an else, and both arms must produce the same type:

let grade = if score >= 90 { "A" } else { "B" };

Because an if is an expression, it can be an arm of another if and the value flows out through both:

let band = if score >= 90 {
    if score >= 97 { "A+" } else { "A" }
} else {
    "B"
};

One small thing the compiler is strict about: an empty if body won’t parse. If you genuinely want a branch that does nothing, put a comment in it or restructure the condition:

if done {
    // nothing to do yet
}

`while` and `loop`

let mut i = 0;
while i < 5 {
    println(i);
    i = i + 1;
}

loop { ... } repeats forever until you break:

let mut n = 0;
loop {
    n = n + 1;
    if n >= 3 { break; }
}

break exits the nearest loop; continue skips to the next iteration.

`for`

for iterates over a range or a collection:

for i in 0..5 {
    println(i);            // 0 1 2 3 4
}

(0..5 is exclusive of the upper bound; 0..=5 includes it.) You’ll use for over real collections once you meet lists and maps in Everyday programs.

`match`

match compares a value against patterns and runs the first that fits:

fn describe(n: Int) -> String {
    return match n {
        0       -> "zero",
        1       -> "one",
        _       -> "many",
    };
}

_ is the wildcard — “anything else.” Matches must be exhaustive: the compiler rejects a match that doesn’t cover every possibility. For a Bool that means both true and false; for open-ended types it means a _ arm. This is a safety feature — you can’t forget a case and have it silently fall through.

match shines on enums (a type that’s one of several named shapes), which you’ll meet in Records & data. The arms can bind the data carried by each variant.

Blocks have values

A { ... } block’s last expression — written without a trailing ; — is the block’s value. That’s why if/match can be used as expressions, and why a function can end in a bare expression instead of return. A block whose last item does end in ; has value ().

let label = {
    let base = compute();
    base + 1               // block evaluates to this
};

That’s the whole control-flow surface. Next we look at working with text: Strings & text.

Strings & text

Building and inspecting text.

Joining

The + operator concatenates strings, and println / f-strings join for you:

let first = "Ada";
let last  = "Lovelace";

let full  = first + " " + last;
let hi    = f"hello, {first}";
println("full name: ", full);

to_string(x) converts a number, bool, duration, etc. into its text form when you need a String specifically:

let n = 42;
let label = "n=" + to_string(n);

Length and inspection

len(s) is a builtin — the byte length of the string:

let s = "hello";
println(len(s));          // 5

Most text operations live in std::str, called as plain functions:

let i   = std::str::index_of("hello world", "world");   // 6
let sub = std::str::substring("hello world", 0, 5);     // "hello"
let up  = std::str::upper("hi");                          // "HI"
let t   = std::str::trim("  spaced  ");                   // "spaced"
let r   = std::str::replace("a-b-c", "-", "+");          // "a+b+c"

Hale has no per-character method syntax (s.charAt(i)); you slice with a range or use the std::str helpers. Slicing a string by byte range:

let s = "hello";
let h = s[0..1];          // "h"

Parsing numbers

Turning text into a number can fail — the text might not be a number. So the parse functions are fallible, and the next chapter (When a call can fail) is exactly about how you handle that. The shape, previewed:

let n = std::str::parse_int("42") or 0;     // 42, or 0 if it wasn't

There are also non-failing predicates to check first (std::str::can_parse_int) when you’d rather branch than recover.

Bytes

Text is String; raw binary is Bytes. They’re different types because they have different rules — a String is valid UTF-8, a Bytes is any sequence of octets, including embedded zeros.

let b = std::bytes::from_string("hello");   // String  -> Bytes
let s = std::str::from_bytes(b);            // Bytes   -> String
let byte0 = std::bytes::at(b, 0) or 0;       // a single byte (fallible)

You’ll work with Bytes directly when you read from a socket or a file and need to frame messages yourself — that’s a topic for wire formats and the systems tier. At this level, just know the two types are distinct and you convert explicitly between them.

Next: the failure model — When a call can fail.

When a call can fail

Hale’s value-level error model — and why you can’t ignore it.

Some calls can’t always succeed. Parsing "banana" as an integer, reading a file that isn’t there, connecting to a host that’s down. In Hale these calls have a type that says so, and the compiler requires you to deal with the failure right at the call site. There are no exceptions, no surprise control flow, and no silently-ignored error codes.

The `fallible` type

A function that can fail declares it with fallible(E), where E is the type of the error payload:

type ParseError { kind: String; input: String; }

fn parse_count(s: String) -> Int fallible(ParseError) {
    if !std::str::can_parse_int(s) {
        fail ParseError { kind: "not_int", input: s };
    }
    return std::str::parse_int(s) or 0;
}

fail <payload>; exits the function through the error path, carrying the payload. The function’s result is now “either an Int, or a ParseError” — and the caller can’t just use it as an Int:

let n = parse_count(input);     // ERROR: error not addressed

You have to address the error. You do that with an or clause.

The five `or` motions

let a = parse_count(s) or raise;              // propagate upward
let b = parse_count(s) or 0;                  // substitute a value
let c = parse_count(s) or handle(err);        // hand off to a helper
let d = parse_count(s) or fail OtherErr { };  // translate the error
some_unit_call()       or discard;            // ignore (unit result only)

or raise — pass the error up to your caller. Your function must itself be fallible(E) with a compatible error type, so the error has somewhere to go.
or <expression> — substitute a fallback value of the success type. Inside the expression, err is bound to the payload, so you can inspect it:
```
let port = std::str::parse_int(arg) or 8080;
```
or handler(err) — call a function that takes the error and returns the success type. Good when several call sites share one recovery policy.
or fail <payload> — fail with a new error of your own type, instead of forwarding the inner one. Use it so a library doesn’t leak a stdlib error type through its own surface.
or discard — throw the error away. Only allowed when the successful result is () (nothing to substitute). The compiler rejects or discard on a value-bearing call and suggests or <fallback> instead.

A real example

Reading a file is fallible — the file might not exist:

fn load_greeting() -> String {
    return std::io::fs::read_file("welcome.txt") or "(no welcome)";
}

If the read fails, we substitute a default. If instead we wanted the failure to stop us, we’d make load_greeting fallible and or raise:

fn load_greeting() -> String fallible(...) {
    return std::io::fs::read_file("welcome.txt") or raise;
}

Chaining

or clauses chain right-to-left — each one disposes of one failure:

let id = parse_count(primary) or parse_count(fallback) or 0;

“Try the primary; if that fails, try the fallback; if that fails, use 0.”

Why it works this way

This is the only failure channel you need at the basics level, and it has a single rule: every fallible call is addressed at the immediate call site. That means when you read a function body, every place that can fail is visibly marked with or. No error propagates invisibly through three stack frames; no try wraps a whole block in ambiguity.

There’s a second failure channel for a different situation — a long-running component whose internal invariant breaks, where the right response is a supervisor’s policy rather than a return value. That’s the structural channel, and it belongs to the services tier (When things fail). For everything you’ll write at this level, fallible + or is the whole story.

Next, we put the pieces together: Your first program.

When the handler can fail too

A recovery handler is often itself a fallible operation — read a fallback file, query a secondary source. Since 2026-07-02 you can write that directly:

fn load(primary: String, backup: String) -> String fallible(IoError) {
    return std::io::fs::read_file(primary)
        or (std::io::fs::read_file(backup) or raise);
}

If the backup read succeeds, its value substitutes. If it also fails, or raise routes the error out through YOUR function’s error path — which is why load must itself be fallible with a compatible error type.

For your own fallible functions the inner or raise is implicit — db_read(k) or self.rebuild(k) propagates the handler’s failure automatically. Stdlib calls and @form methods used as handlers still need the explicit nested spelling above (the compiler will tell you, with the exact rewrite, if you forget).

Your first program

Everything from this level, in one small CLI.

Let’s build a complete little command-line tool using only what the basics covered: variables, math, functions, control flow, and the fallible model. It converts a Celsius temperature passed on the command line into Fahrenheit.

fn c_to_f(c: Float) -> Float {
    return c * 9.0 / 5.0 + 32.0;
}

fn main() {
    // arg(0) is the program name; arg(1) is the first real argument.
    let raw = std::env::arg_or(1, "20");

    let celsius = std::str::parse_float(raw) or {
        eprintln("not a number: ", raw);
        return;
    };

    let f = c_to_f(celsius);
    println(raw, "C = ", to_string(f), "F");
}

Run it:

hale run temp.hl 100

100C = 212F

With no argument it falls back to "20" and prints 20C = 68F — the tool self-demonstrates.

What each piece is doing

std::env::arg_or(1, "20") reads command-line argument 1, or "20" if there isn’t one. (std::env::args_count() and std::env::arg(i) are the lower-level pair.)
std::str::parse_float(raw) or { ... } addresses the fallible parse. Here the or arm prints to standard error and returns early — a fine motion when the success type is a value but you’d rather bail than substitute. (eprintln is println for stderr.)
c_to_f is a plain free function — a calculation with no state, exactly what free functions are for.
println(raw, "C = ", to_string(f), "F") concatenates its arguments. No format string.

This is a real program

You can hale build temp.hl and ship the resulting binary. It reads input, validates it, computes, and reports — and it’s honest about failure, because the parse had to be addressed. At this level Hale is a small, sharp scripting language.

You may have noticed there’s no locus here, no bus, none of the structural machinery from the introduction’s matchmaker. You don’t need it yet. A program that’s a handful of functions and a main is a perfectly good Hale program.

The next level is where structure starts to pay off — when your program grows state that lives over time, talks to the filesystem and the network, and wants to be organized into named parts. That’s where the locus earns its place.

Next: The locus, gently.

The locus, gently

Coming from Python / Node? A locus is the closest thing Hale has to a class or a module. It bundles state (fields) with behavior (methods) and you make instances of it. There’s no separate “module” and “class” — one construct plays both roles. This chapter only uses the object-like 80%; the lifecycle and messaging parts wait until you need them.

In the basics, a program was functions and a main. That’s fine until you have state that lives over time — a counter, a cache, a configuration, a connection — or until a pile of free functions wants a name to live under. That’s what a locus is for.

A locus with state

locus Counter {
    params {
        count: Int = 0;
    }
    fn bump() {
        self.count = self.count + 1;
    }
    fn value() -> Int {
        return self.count;
    }
}

params is the locus’s state — typed fields, each with a default. Inside any method, self.field reads and writes that state. Methods are fns, called with .:

fn main() {
    let c = Counter { };          // make one; count defaults to 0
    c.bump();
    c.bump();
    println(c.value());           // 2
}

You construct a locus with Name { ... }, overriding any field you like:

let c = Counter { count: 10 };

If you’ve used objects before, this is familiar: params are the instance variables, methods are the methods, Counter { } is the constructor. Hale collapses “constructor parameters” and “instance fields” into one params block — the same way Ruby’s @foo or Python’s self.foo are just attributes.

`type` vs `locus`

You met type for plain records earlier. The line between them:

type is pure data — a record you construct, pass around by value, and read. No methods, no state that changes itself, no lifecycle.
locus is data with behavior and identity — it has methods, it mutates its own state, and (at the next level) it can run over time and send messages.

type Point { x: Int; y: Int; }        // just data

locus Tally {                          // data + behavior
    params { total: Int = 0; }
    fn add(n: Int) { self.total = self.total + n; }
}

These aren’t rival categories — they’re points on a gradient. A type is a locus that hasn’t grown behavior yet. When a record starts accumulating methods, you promote it from type to locus. There is no third thing to reach for.

Two everyday shapes

Almost every locus you write at this level is one of two shapes.

The app locus — the outer wrapper for a whole program. Your main reads arguments and hands off to it:

locus App {
    params { name: String = "world"; }
    fn run() {
        println("hello, ", self.name);
    }
}

fn main() {
    let app = App { name: std::env::arg_or(1, "world") };
    app.run();
}

This replaces the bare-main-with-helpers shape from the basics: the app’s top-level state and entry point now have a home. (At the services level, run() becomes a special lifecycle method the runtime drives — but as an ordinary method it already works.)

The namespace lotus — a home for a coherent vocabulary of helpers, with little or no state. Hale’s stand-in for a “module of functions” or a static class:

locus Temps {
    fn c_to_f(c: Float) -> Float { return c * 9.0 / 5.0 + 32.0; }
    fn f_to_c(f: Float) -> Float { return (f - 32.0) * 5.0 / 9.0; }
}

fn main() {
    let t = Temps { };
    println(t.c_to_f(100.0));     // 212
}

You instantiate it once and dispatch through it. When three or more related free functions show up, this is usually the tidier home for them.

A rule worth meeting early

Hale has one structural commitment that shapes everything above:

Every named piece of state belongs to exactly one locus.

No globals, no shared mutable buffer that nobody owns, no “floating” value passed around by side channel. If you’re not sure where some state should live, the productive question is “which locus owns this?” — and there’s almost always a clean answer. This is what lets Hale clean up memory and coordinate failure without a garbage collector; you’ll see the payoff at the systems level. For now it’s just good hygiene: put state where it belongs.

Next: the collections you’ll reach for constantly — Lists & maps.

Lists & maps

Coming from Python / Node? Hale has no built-in list / [] that grows, no dict / {}, no Vec<T> or Map<K,V>. Instead you declare a small locus and annotate it with a form — @form(vec) for a growable list, @form(hashmap) for a keyed map. You get the same operations (push, get, len, set, …); they’re just methods on a locus you named.

A growable list — `@form(vec)`

@form(vec)
locus Names {
    capacity { heap items of String; }
}

fn main() {
    let names = Names { };
    names.push("Ada");
    names.push("Grace");
    println(names.len());            // 2
    let first = names.get(0) or "";  // "Ada"
}

Three things are happening:

@form(vec) tells the compiler “this locus is a growable list.” It synthesizes the methods for you: push, get, set, pop, len, is_empty, and sorting.
capacity { heap items of String; } is where the list’s storage lives. Read it as “this list holds Strings.” The element type comes from here.
get and pop are fallible — an index might be out of bounds — so you address them with or, just like any fallible call:
```
let x = names.get(99) or "(missing)";
```

Iterate with for over the items:

for name in names.items {
    println(name);
}

(The indexed while i < names.len() + .get(i) walk also works, and is what you want when you need the index — but prefer .items as the default: it reads better and, on hashmaps especially, it’s dramatically faster. A hashmap walk via key_at(i)/entry_at(i) rescans from slot 0 on every call — O(cap×len) for the whole walk — while for e in m.entries visits each occupied slot once.)

The element type can be anything — a primitive, or one of your own type records:

type Player { id: String; score: Int; }

@form(vec)
locus Roster {
    capacity { heap players of Player; }
}

A keyed map — `@form(hashmap)`

A map keys entries by a field on the value itself — the key is one of the record’s fields, named with indexed_by:

type Account { user: String; balance: Int; }

@form(hashmap)
locus Accounts {
    capacity { pool entries of Account indexed_by user; }
}

fn main() {
    let accts = Accounts { };
    accts.set(Account { user: "ada",   balance: 100 });
    accts.set(Account { user: "grace", balance: 250 });

    let a = accts.get("ada") or Account { user: "", balance: 0 };
    println(a.balance);                       // 100
    println(accts.has("grace"));              // true
}

set(value) takes the whole record and reads the key out of its indexed_by field — there’s no separate key argument.
get(key) and remove(key) are fallible (the key might be absent); has(key) returns a plain Bool.
Keys are Int or String.

This “the key is a field of the value” shape matches how keyed stores almost always look in practice — you rarely have a key that isn’t already part of the thing you’re storing.

A bounded queue — `@form(ring_buffer)`

When you want a fixed-size FIFO that drops the oldest entry once it’s full (recent-events buffers, sliding windows):

@form(ring_buffer, cap = 64)
locus Recent {
    capacity { pool events of String; }
}

push returns a Bool — false when the buffer is full — so you decide whether to drop or apply backpressure. pop is fallible on empty.

A list inside a type — `bounded[T; N]`

The forms above are loci — whole entities with their own lifecycle. A type is pure data, so it can’t hold one. What it CAN hold (since 2026-07-02) is a bounded collection — a fixed-capacity list laid out inline in the value:

type Message {
    id:   String;
    tags: bounded[String; 32];
}

fn main() {
    let msg = Message { id: "msg1" };   // tags starts empty —
                                        // bounded fields can't be
                                        // spelled in a literal
    push(msg.tags, "urgent") or raise;
    push(msg.tags, "billing") or raise;

    for tag in msg.tags {
        println(tag);
    }
    println(count(msg.tags));           // 2
}

Six operations, all compiler intrinsics (types stay method-free, like len(s)):

push(f, x) — append; fallible with CapacityError { cap, count } when full. What to do at capacity is your policy, written in the or arm.
at(f, i) — read slot i; fallible IndexError out of range.
set(f, i, x) — overwrite a live slot; fallible IndexError.
count(f) — the live count (the capacity lives in the type).
clear(f) — reset to empty.
truncate(f, n) — shrink the count (never grows); with set, this is the drop-front idiom for FIFO windows.

Use bounded when the maximum is known and the list is a field of a value — per-message tags, route parameters, a chat window. The old workaround (a tab-separated string you re-parse on every read) is retired: pond’s router, LLM, and conversation libraries all migrated. Whole-struct copies carry the elements automatically, and scalar-element bounded values even cross the zero-copy bus as flat bytes.

Why a form instead of a generic type

A list isn’t just “a type parameterized by its element” — it’s a bundle of decisions: contiguous memory, dynamic length, who owns the storage, what happens to it when the owner goes away. A form makes those decisions at the declaration, and picks an implementation tuned for the element type. The upshot for you at this level is simple: @form(vec) is your list, @form(hashmap) is your map. The reasoning behind forms — and how to choose between them on performance grounds — is in Forms under the hood at the systems level.

One form per locus: a locus is a list or a map, not both. If you need both, that’s two loci — which is usually what the data wanted anyway.

Next: Records & data.

Records & data

Coming from Python / Node? Where you’d reach for a dict or an object literal to pass structured data around, Hale uses a named type — a fixed-shape record with typed fields. It’s closer to a TypeScript interface / a Python @dataclass than to a free-form dict: the shape is declared, and the compiler checks it.

Records — `type`

type Player {
    id:    String;
    name:  String;
    score: Int;
}

Construct with a struct literal, naming each field:

let p = Player { id: "p1", name: "Ada", score: 0 };
println(p.name);                  // field access with .

Records are pure data: you pass them by value, read their fields, and compare them. They carry no behavior and no lifecycle. Fields can have defaults, so callers can omit them:

type Config { host: String = "127.0.0.1"; port: Int = 8080; }

let c = Config { port: 9000 };    // host defaults

Records nest, and they’re what travels on the bus and in and out of functions. When a record starts wanting methods, that’s the signal to promote it to a locus.

Arrays

A fixed sequence of one type is an array. [T] is a slice (a view of some elements); [T; N] is a fixed-length array:

type Match { players: [Player]; }     // a slice of Players

let xs = [1, 2, 3];                    // an array literal
let zeros = [0; 8];                    // eight zeros

For a sequence that grows, you want a @form(vec) list from the previous chapter, not a bare array.

Tuples

A quick, unnamed grouping of a few values:

let pair = (1, "one");

Reach for a type once the grouping has meaning worth naming; tuples are for the throwaway case.

Enums — one of several shapes

An enum is a value that is exactly one of a set of named variants — a tagged union / sum type:

type Light = enum { Red, Yellow, Green };

fn next(l: Light) -> Light {
    return match l {
        Light::Red    -> Light::Green,
        Light::Green  -> Light::Yellow,
        Light::Yellow -> Light::Red,
    };
}

Construct a variant with EnumName::Variant, and use match to branch on it — exhaustively, so you can’t forget a case.

Variants can carry data:

type Event = enum {
    Tick(Int),
    Trade(Decimal, Int),
    Halt,
};

fn handle(e: Event) {
    match e {
        Event::Tick(0)            -> println("tick zero"),
        Event::Tick(n)            -> println("tick #", n),
        Event::Trade(price, size) -> println("trade ", size, " @ ", price),
        Event::Halt               -> println("halt"),
    }
}

The match arms bind the payload — Tick(n) pulls the integer out as n. You can also match a literal sub-pattern (Tick(0)) ahead of the general one. This is the idiomatic way to model “the message is one of these kinds, each with its own data” — and it pairs naturally with the typed bus at the next level.

Enums fill the role of Option<T> / Result<T, E> from other languages when you want a closed set of outcomes as data. For the “this call failed” case specifically, prefer the fallible channel — it’s the purpose-built tool and the compiler enforces handling.

Next: reading and writing the world — Files.

Files

Coming from Python / Node? No try/except, no .catch(), no checking err != nil. Every filesystem call that can fail returns a fallible value, and the compiler makes you address it with or right where you call it. The failure is visible at the call site, always.

Reading and writing

fn main() {
    // Write a file (creating or truncating it).
    std::io::fs::write_file("greeting.txt", "hello\n") or raise;

    // Read it back. read_file returns the whole contents as a String.
    let body = std::io::fs::read_file("greeting.txt") or "(empty)";
    println(body);
}

For main to use or raise, main would need to be fallible; more often at the top level you substitute or report:

fn main() {
    let body = std::io::fs::read_file("config.toml") or {
        eprintln("no config; using defaults");
        return;
    };
    use_config(body);
}

The surface

All of these live under std::io::fs and all are fallible(IoError) except file_exists:

Call	Does
`read_file(path) -> String`	whole-file read
`read_bytes(path) -> Bytes`	whole-file read, binary
`write_file(path, contents)`	create / truncate
`write_file_append(path, contents)`	append
`file_size(path) -> Int`	size in bytes
`mkdir(path)`	create a directory
`rename(from, to)`	move / rename
`unlink(path)`	delete
`mktemp(prefix) -> String`	make a temp file
`list_dir(path) -> ...`	enumerate entries
`file_exists(path) -> Bool`	test (never fails)

The error tells you what happened

When a call fails, the IoError payload carries a kind (String), the raw errno (Int), and the path (String). kind is a stable tag derived from the OS error — "not_found", "permission_denied", "already_exists", "is_dir", and so on. So you can branch on the kind of failure without parsing error strings:

fn handle_io(e: IoError) -> String {
    if e.kind == "not_found" {
        return "";                     // treat missing as empty
    }
    eprintln("io error on ", e.path, ": ", e.kind);
    return "";
}

fn load(path: String) -> String {
    return std::io::fs::read_file(path) or handle_io(err);
}

This is the or handler(err) motion from the basics, put to work: one recovery function shared across every read.

Idempotent setup

or discard is handy for “make sure this exists; don’t care if it already did” — it’s allowed because the result type is ():

std::io::fs::mkdir("cache") or discard;

Held-open files

read_file / write_file are whole-file, one-shot. When you want a file handle you read from incrementally — line by line, or seeking around — use std::io::file::File, a locus that holds the open descriptor for its lifetime:

let f = std::io::file::open("log.txt", "r") or raise;
let line = f.read_line() or "";
// ... f closes when it goes out of scope

That “closes when it goes out of scope” is the locus lifecycle quietly at work — f owns the descriptor and releases it when its binding’s scope ends. You’ll see that mechanism in full at the services level; here it just means you don’t write a manual close.

Next: structured data on disk and the wire — JSON.

JSON

Coming from Python / Node? There’s no JSON.parse that hands you a dynamic object you index freely. Hale’s std::json is field-oriented: you ask a JSON string for a named field and a type (find_string_field, find_int_field, …), and you build output with a streaming Builder. At v1 it’s tuned for flat objects and arrays — the common shapes for config and wire messages.

Reading

Pull individual fields out of a JSON string by name:

let doc = "{\"name\": \"Ada\", \"age\": 36, \"active\": true}";

let name   = std::json::find_string_field(doc, "name");    // "Ada"
let age    = std::json::find_int_field(doc, "age");        // 36
let active  = std::json::find_bool_field(doc, "active");    // true

Missing fields come back as the type’s zero value ("", 0, false) rather than failing — so for “is this really present?” semantics, check with the raw accessor or validate upstream. find_field_raw returns the raw substring for a field, which is how you reach into a nested object:

let inner = std::json::find_field_raw(doc, "address");
let city  = std::json::find_string_field(inner, "city");

Parsing into a type

Pulling fields one by one rescans the document per field. When you have a fixed shape, tag the fields with their JSON keys and the compiler generates a single-pass parser for you:

type Order {
    id: Int      `json:"id"`;
    price: Int   `json:"px"`;     // JSON key differs from the field name
    qty: Float   `json:"sz"`;
    active: Bool `json:"on"`;
    side: String `json:"side"`;
    currency: String = "USD";     // optional: default fills a missing key
}

let o = Order::from_json(body) or raise;
println(o.price);

Type::from_json(s) -> Type fallible(JsonError) walks the object once, dispatches each key to the matching field, and reads the value by the field’s declared type — no per-field rescan, and unmatched keys (and nested objects/arrays under them) are skipped. The json:"<key>" tag sets the JSON key; without it the field name is the key.

A missing field raises JsonError, naming the field — unless the field declares a default (currency: String = "USD"), in which case the default fills it. Because from_json is fallible, you must address it (or raise, or <fallback>, …) like any other fallible call.

A field whose type is another json:-tagged struct is parsed recursively — nest as deep as you like, and a missing field anywhere raises with that field’s name:

type Addr   { city: String `json:"city"`; zip: Int `json:"zip"`; }
type Person { name: String `json:"name"`; home: Addr `json:"home"`; }

let p = Person::from_json(body) or raise;
println(p.home.city);

The same tags drive the reverse direction — Type::to_json(value) serializes back to a JSON string (numbers and bools bare, strings escaped, nested structs recursed), so from_json / to_json round-trip:

let body = Order::to_json(o);          // -> {"id":7,"px":...}
let o2   = Order::from_json(body) or raise;

to_json is not fallible — serialization always succeeds.

The tag is general key:"value" metadata — json: is one consumer; other keys are free for future tools.

Fields must be scalars — Int / Float / Bool / String — or nested json:-tagged structs. Array fields are not supported, by design: Hale sequences are locus-owned (there is no heap-owning value list to put in a struct). To read a JSON array, walk it with the array cursor and push each element into a @form(vec) cell on a locus — from_json handles the flat/nested record shape, arrays stay an explicit, locus-owned step.

Arrays

Walk a JSON array with the iterator pair:

let arr = "[10, 20, 30]";
let mut it = std::json::array_first(arr);
while !it.done {
    let n = std::str::parse_int(it.element) or 0;
    println(n);
    it = std::json::array_next(it);
}

array_first returns an iterator with the first element and a done flag; array_next advances it.

Writing

The Builder is a streaming assembler — it tracks open scopes and inserts separators for you, so you can’t produce malformed JSON by forgetting a comma:

let b = std::json::Builder { };
b.begin_object();
b.field("name", "Ada");
b.int_field("age", 36);
b.bool_field("active", true);
b.end_object();
let out = b.result();      // {"name":"Ada","age":36,"active":true}

Nest objects and arrays by pairing begin_* / end_*. String values are escaped per the JSON spec automatically; if you need to escape or unescape a string by hand, std::json::escape_string and unescape_string are there.

When the shape is deep

std::json at v1 is built for flat objects and top-level arrays — the great majority of config files and API messages. For deeply-nested documents you walk level by level with find_field_raw, treating each nested object as its own flat document. If you’re parsing a genuinely complex or performance-critical format, the wire-format techniques and the systems-tier performance chapter cover building your own parser over Bytes.

Next: serving and calling over the network — HTTP.

HTTP

Coming from Python / Node? This is your Flask / Express moment — but instead of decorators or a routes table, you write a handler locus: a locus with a handle(req) -> Response method. std::http::Server runs the accept loop and calls your handler per request. Routing is a match on the path inside handle. (A fuller router with path params lives in the pond library catalog.)

A server

locus Api {
    params { hits: Int = 0; }

    fn handle(req: std::http::Request) -> std::http::Response {
        if req.path == "/health" {
            return std::http::Response {
                status: 200, body: "ok\n", content_type: "text/plain"
            };
        }
        self.hits = self.hits + 1;
        return std::http::Response {
            status: 200,
            body: f"hello — hit #{self.hits}\n",
            content_type: "text/plain"
        };
    }
}

fn main() {
    let server = std::http::Server { port: 8080, handler: Api { } };
    // Server runs its accept loop until the process is stopped.
}

hale build it, run it, and curl localhost:8080/health. The handler’s params persist across requests — self.hits counts them — because the Api locus is alive for the whole run.

The pieces

std::http::Request carries method, path, version, body, and headers (looked up case-insensitively). You match / if on method and path to route.
std::http::Response needs at least status and body; content_type defaults to text/plain, and you can add custom headers.
std::http::Server takes a port and a handler, then owns the listen-accept-parse-dispatch loop. max_accepts: N bounds it to N requests (handy for tests); the default runs until stopped.

A first taste of interfaces

How does Server know Api is a valid handler? Server’s handler field has the type std::http::Handler, which is an interface — a named set of required methods:

// (declared in the standard library)
interface Handler {
    fn handle(req: Request) -> Response;
}

Any locus that has a matching handle method satisfies Handler — automatically, with no implements clause to write. This is structural satisfaction: the shape is the contract. You declared Api with the right method, so it’s a Handler. (Go programmers will recognize this; it’s interfaces without the impl ceremony.)

Calling out

The standard library ships the server. For an HTTP client — making outbound requests, with connection pooling and TLS — reach for the http/client library in pond:

import "vendor/pond/http/client" as http;
// let resp = http::get("https://example.com") or raise;

That import line, the bindings that wire a server across processes, and the lifecycle that lets a server shut down cleanly on Ctrl-C are all next-level topics — but the handler you wrote above doesn’t change when you get there. The server code is already complete; the surrounding tier just gives it more ways to be deployed and supervised.

Next: the transports below HTTP — UDP & TLS.

UDP & TLS

HTTP covers the request/response server. Below it sit two more transports: UDP for connectionless datagrams, and TLS for an encrypted client connection. Both are thin wrappers over the platform sockets — each call returns or takes a file descriptor (an Int).

For ordinary TCP request/response, prefer std::http or the std::io::tcp Listener / Stream loci. This chapter is the raw-datagram and TLS-client surface.

UDP datagrams — `std::io::udp`

Bind a socket, then send and receive datagrams. bind and the I/O calls are fallible(IoError):

let fd = std::io::udp::bind("0.0.0.0", 9000) or raise;
std::io::udp::send(fd, "127.0.0.1", 9001, "ping") or raise;

To receive and learn who sent it, use recv_with_source and read the thread-local source cache immediately after:

let msg  = std::io::udp::recv_with_source(fd, 1500) or raise;  // Bytes
let host = std::io::udp::last_source_host();
let port = std::io::udp::last_source_port();
println(host, ":", to_string(port), " sent ",
        to_string(len(msg)), " bytes");
std::io::udp::close(fd);

Datagram boundaries are preserved — one send is one recv. Delivery is best-effort; layer acknowledgement or retry on top if you need it. Multicast is a join_group away (set_multicast_ttl / set_multicast_loop tune it), and set_recv_timeout(fd, 100ms) bounds a quiet recv.

UDP as a bus transport. The raw socket above is not the typed bus. To carry bus messages over UDP, use the udp://host:port substrate transport instead (see the bus) — same dispatch contract as unix://.

TLS client — `std::io::tls`

connect does the TCP connection and the TLS 1.2+ handshake (SNI

system trust store) in one call, via the platform OpenSSL:

let h = std::io::tls::connect("example.com", 443) or raise;
std::io::tls::send_bytes(h, std::bytes::from_string(
    "GET / HTTP/1.0\r\nHost: example.com\r\n\r\n"));
let resp = std::io::tls::recv_bytes(h, 4096);   // Bytes
println(std::str::from_bytes(resp));
std::io::tls::close(h);

This is client-side only — there is no TLS server in the stdlib. set_recv_timeout(h, d) bounds a read; with one set, recv_into returns the -2 “timed out, retryable” sentinel so a long-lived client can run keep-alive work instead of hanging.

Tuning sockets — `std::io::sockopt`

The UDP set_option_int / set_option_bool / get_option_int calls take a level and name from std::io::sockopt’s named constants, so you never hardcode a platform number:

std::io::udp::set_option_bool(
    fd, std::io::sockopt::SOL_SOCKET(),
    std::io::sockopt::SO_REUSEADDR(), true) or raise;

For TCP, std::io::tcp::set_nodelay(fd, true) is the common one (disable Nagle for latency).

Next: hashing & encoding — Hashing, encoding & randomness.

Hashing, encoding & randomness

A grab-bag of the cryptographic and byte-wrangling helpers real programs reach for: digests, message authentication, base64, and random numbers. They live under std::crypto, std::text, and std::rand / std::os.

Hashes & checksums — `std::crypto`

Digests take Bytes and return Bytes:

let data = std::bytes::from_string("hello");
let key  = std::bytes::from_string("secret");

let digest = std::crypto::sha256(data);            // 32 bytes
let tag    = std::crypto::hmac_sha256(key, data);  // 32 bytes
let sum    = std::crypto::crc32(data);             // Int (IEEE 802.3 / zlib)

let digest512 = std::crypto::sha512(data);            // 64 bytes
let tag512    = std::crypto::hmac_sha512(key, data);  // 64 bytes

sha1 (20 bytes) is there too for legacy interop; reach for sha256 by default. The 64-bit-word SHA-512 siblings — sha512 / hmac_sha512 (64-byte) — are the same non-fallible shape, for venues that sign with HMAC-SHA512 (e.g. Kraken, Gate.io). The hashes and crc32 are hand-rolled — no OpenSSL dependency.

Raw hash bytes aren’t printable, so encode one to show or transport it:

println(std::text::base64::encode(digest));

Base64 — `std::text::base64`

let enc = std::text::base64::encode(data);     // Bytes -> String
let dec = std::text::base64::decode(enc);      // String -> Bytes
let url = std::text::base64::url_encode(data);  // URL-safe, unpadded

url_encode is RFC 4648 §5 (the -_ alphabet, no = padding) — the form JWTs, OAuth, and webhook signatures use. decode accepts both alphabets.

Signing — ECDSA P-256 (`ES256`)

For JWT / venue auth, std::crypto ships ECDSA over NIST P-256 with SHA-256 (the ES256 JWS algorithm), OpenSSL-backed:

// key: PEM EC private key (SEC1 or PKCS#8); message: Bytes
let sig = std::crypto::ecdsa_p256_sign(key, message) or raise;

// pubkey: PEM SPKI; sig: raw r‖s, 64 bytes (the JWS/COSE form)
let ok = std::crypto::ecdsa_p256_verify(pubkey, message, sig);

ecdsa_p256_sign has two faces: a bare call returns an empty Bytes on failure (check len(sig) == 0), and in an or context it is fallible(CryptoError), so or raise / or fail err / or handle(err) propagate a structured CryptoError { kind, detail } like any other error.

Random numbers — `std::rand` and `std::os`

std::rand is a fast, non-cryptographic PRNG — fine for jitter, sampling, shuffling, game logic:

let roll = std::rand::next_int(6) + 1;   // a die roll, [1, 6]

For anything security-sensitive (tokens, nonces, keys), use the CSPRNG instead:

let nonce = std::os::getrandom(16) or raise;   // 16 random Bytes

Next: reading configuration — CLI & config.

CLI & config

Coming from Python / Node? No argparse, no yargs, no dotenv. Reading arguments and environment is a few direct calls under std::env; layering argv over env over defaults is a small std::cli::Resolver. Rich flag parsing (--name=value, subcommands) is library territory, not built into the language.

Arguments and environment

fn main() {
    let n = std::env::args_count();        // includes the program name
    let first = std::env::arg(1);          // positional arg 1
    let port  = std::env::arg_or(2, "8080");  // with a default

    let home  = std::env::var("HOME");     // environment variable
    let debug = std::env::var_exists("DEBUG");
}

arg(0) is the program name; user arguments start at arg(1).
arg_or(i, default) is the everyday form — no bounds-checking dance.
var(name) reads an environment variable; var_exists(name) tests for one.

Layered configuration

A common need: a setting should come from a command-line argument if given, else an environment variable, else a built-in default. std::cli::Resolver expresses that precedence directly:

fn main() {
    let cfg = std::cli::Resolver { prefix: "MYAPP" };

    // argv positional "port", else $MYAPP_PORT, else "8080"
    let port = cfg.get("port", "8080");
    let host = cfg.get("host", "127.0.0.1");

    println("listening on ", host, ":", port);
}

The resolver checks the argument, then the prefixed environment variable (MYAPP_PORT), then the supplied default. Empty values fall through to the next layer rather than counting as “set.”

Interactive terminal I/O

For a tool that draws to the terminal or reads keystrokes, a few std:: primitives cover the OS surface without an FFI dependency.

std::term::is_tty(fd) answers “is this a terminal?” — the usual guard for whether to emit color:

let color = std::term::is_tty(2);   // fd 2 = stderr

std::term::size() returns a TermSize { cols, rows } record (and {0, 0} when stdout isn’t a tty). std::term::RawMode is a guard locus that puts the terminal in raw mode for its lifetime — no line buffering, no echo — and restores it on scope exit, and on a panic or unhandled error too via an atexit backstop:

fn main() {
    let raw = std::term::RawMode { };       // birth: enter raw mode
    // ... read keys, draw frames ...
}                                           // dissolve: restore the terminal

For the bytes themselves, std::io::stdin::read_byte(timeout_ms) polls one byte (0..255, -1 on timeout, -2 on EOF), and std::io::stdout::write_bytes(s) does a raw, unbuffered write — it fflushes first so it stays ordered with any println output:

loop {
    let b = std::io::stdin::read_byte(100);   // 100ms poll
    if b == -1 { continue; }                    // timeout: redraw, tick, …
    if b == -2 { break; }                       // EOF
    std::io::stdout::write_bytes("got a key\r\n");
}

These are primitives, not a TUI — key decoding and styling live in a library on top of them.

Where this fits

This is the boundary between the outside world and your program. The idiomatic shape, building on the app locus: main resolves configuration, then constructs the app locus with it.

locus App {
    params { host: String = "127.0.0.1"; port: String = "8080"; }
    fn run() { println("listening on ", self.host, ":", self.port); }
}

fn main() {
    let cfg = std::cli::Resolver { prefix: "MYAPP" };
    let app = App {
        host: cfg.get("host", "127.0.0.1"),
        port: cfg.get("port", "8080"),
    };
    app.run();
}

Configuration enters once, at the edge, and flows inward as typed locus state — never read again from a global deep inside the program. That keeps every setting owned by exactly one locus, the rule from The locus, gently.

Next: seeing what your program is doing — Logging.

Logging

Coming from Python / Node? Instead of a global logger object you configure at import time, Hale logging is built on the message bus: a Logger publishes typed log events, and a sink subscribes to them and decides what to do (print, write a file, ship to a collector). It’s your first look at the bus — the mechanism the whole next tier is built on.

The minimal setup

Two pieces: something that emits log events, and something that consumes them.

fn main() {
    // The sink must exist before anything logs to it.
    let sink = std::log::StdoutSink { };

    let log = std::log::Logger { name: "app" };
    log.info("starting up");
    log.warn("disk almost full");
    log.error("connection refused");
}

StdoutSink subscribes to all log events and prints them; Logger emits them. The ordering matters — instantiate the sink first, because a subscriber has to exist before a publisher sends, or the early events have nowhere to go.

Levels

Loggers carry the usual levels: trace, debug, info, warn, error. Call the matching method:

log.debug(f"cache size = {n}");
log.error(f"request {id} failed: {reason}");

Per-component loggers, one sink

Each Logger has a name, which becomes the event’s topic (log.app, log.db, log.http). You can give every component its own named logger and still have a single sink see everything:

fn main() {
    let sink = std::log::StdoutSink { };

    let app_log = std::log::Logger { name: "app" };
    let db_log  = std::log::Logger { name: "db" };

    app_log.info("ready");
    db_log.warn("slow query");
}

A custom sink subscribes to a subtree — log.db.** to capture only database logs, log.** to capture all of them — without the loggers knowing who’s listening. Publisher and subscriber never reference each other; they only share the topic name.

You just used the bus

That decoupling — emitters publish, sinks subscribe, neither holds a reference to the other — is the bus, Hale’s typed publish/subscribe channel. Logging is a small, friendly instance of it: Logger publishes a LogEvent on a topic, StdoutSink subscribes. The same mechanism carries any typed message between any two loci in your program.

At this level you’ve used the bus without declaring one. The services tier makes it first-class: you declare your own topics, subscribe and publish them in a locus’s bus { } block, and use them to wire concurrent components together. Everything you just saw — emit, subscribe to a subtree, no direct references — is exactly how it works at scale.

That’s the everyday tier. With loci, collections, files, JSON, HTTP, config, and logging, you can build real applications — CLIs, web services, data tools. The next tier is for programs that run over time and coordinate: long-lived services, a typed bus you design, concurrency, and supervision.

Next: The lifecycle.

The lifecycle

Coming from Go? A long-running locus is like a goroutine with structure: instead of go func(){...}() and a context you thread around for cancellation, a locus has named lifecycle methods the runtime drives — birth → run → drain → dissolve — and shutdown cascades through the tree automatically. You write the phases; the runtime sequences them.

Until now, loci have been object-like: state plus methods you call. A locus can also run over time. When it does, it moves through a fixed sequence of lifecycle states, and the runtime guarantees the ordering.

The five phases

locus Server {
    params { listen_fd: Int = -1; }

    birth()    { /* acquire: open sockets, files, buffers */ }
    run()      { /* steady-state work — the main loop */ }
    drain()    { /* stop taking new work; finish in-flight */ }
    dissolve() { /* release what birth acquired */ }
}

birth() runs once, at construction, after the locus’s state is initialized. Acquire resources here — open a socket, read a file, allocate a buffer. By the time it returns, the locus is live.
run() is the steady-state body — typically a loop that serves requests, drains a queue, or ticks on a timer. It runs until it returns on its own or the locus is asked to shut down.
drain() runs when shutdown begins: stop accepting new work, let in-flight work finish.
dissolve() runs last: release what birth acquired. The locus’s memory is freed wholesale right after.

There’s also accept and release for parent/child relationships — those belong to Parents & children. And on_failure for recovery — When things fail.

You only write the phases you need; the compiler supplies no-op defaults for the rest. A locus with just birth and run is completely normal.

One rule: no return inside birth / run / dissolve bodies. These are driven by the runtime, not called by you, so “return a value” has no meaning. Factor any early-exit logic into a helper free function the body calls.

A simple service

locus Ticker {
    params { count: Int = 0; limit: Int = 5; }

    run() {
        while self.count < self.limit {
            println("tick ", self.count);
            std::time::sleep(500ms);
            self.count = self.count + 1;
        }
    }
}

fn main() {
    Ticker { limit: 3 };     // runs to completion, then tears down
}

When does a locus dissolve?

This is the one piece of bookkeeping worth internalizing, because it’s how Hale frees resources without a defer or a finally:

Statement position (Ticker { }; — no binding): the locus runs its whole lifecycle right there and tears down at the end of the statement. Fire-and-forget.
let-bound (let t = Ticker { };): it’s born and runs, but dissolve is deferred to the end of the enclosing function’s scope. The binding stays usable for method calls until then.
Long-lived (the locus subscribes to the bus, or its run() hasn’t returned): it stays alive until its scope exits, regardless of binding — it has to, to keep receiving messages.

So let keeps a locus alive for the scope; statement position is fire-and-forget. When several let-bound loci share a scope, they dissolve in reverse order of creation (the later one, which may depend on the earlier, goes first).

Replacing a locus held in a field

If a locus holds another locus in a field — say a server that keeps its current connection in self.conn — assigning a fresh one replaces a live thing, so it’s a lifecycle event, not a plain store:

self.conn = Connection { url: next };   // reconnect

Hale tears the old self.conn down first (drain → dissolve, so its socket and any children are released), then builds the new one into this locus’s arena and points the field at it. The old and new never overlap, and the new instance lives until the parent dissolves — no manual close, no leak. This is break-before-make: if you need make-before-break (hold the old connection open while the new one warms up), keep both in separate fields and swap explicitly.

To reconfigure the same instance instead of replacing it, mutate in place — self.conn.url = next; — which keeps the connection and triggers no teardown.

Shutdown cascades

drain() is always depth-first cascading. Calling it on a locus first drains all of its children (and theirs, recursively), waits for them, then drains itself, then dissolves. You never write a manual teardown walk.

This is what makes Ctrl-C trivial: SIGINT calls drain() on the program’s root, the whole tree winds down in dependency order, in-flight work finishes, resources release, the process exits cleanly. “Press Ctrl-C and it shuts down properly” is the default, not something you wire up.

The lifecycle is the skeleton of every long-running Hale program. Next, the thing those programs use to talk to each other: The bus.

The bus

Coming from Go? Topics are like channels, but typed by a declaration instead of by chan T, and many-to-many instead of point-to-point. You don’t pass a channel into a goroutine; a locus declares which topics it subscribes to and publishes, and the runtime wires the delivery. No channel plumbing threaded through constructors.

You met the bus implicitly in logging: emitters publish, sinks subscribe, neither references the other. Here you declare and use it directly.

Topics are typed declarations

A topic names a channel and the type that flows on it:

type Order { id: String; amount: Decimal; }

topic OrderPlaced  { payload: Order; }
topic OrderShipped { payload: Order; }

A topic is a top-level declaration, like type or locus. It’s referenced by name — never a magic string — so the payload type is checked at every publish and every handler, and renaming the topic moves every use with it.

A locus declares its bus interface in a bus { } block:

locus Warehouse {
    bus {
        subscribe OrderPlaced as on_order;     // inbound
        publish   OrderShipped;                 // outbound
    }

    fn on_order(o: Order) {
        // ... pick and pack ...
        OrderShipped <- o;                       // the send
    }
}

subscribe TOPIC as HANDLER; wires inbound messages to a handler method. The handler must exist with the matching signature — fn on_order(o: Order) — and the compiler checks it.
publish TOPIC; authorizes this locus to send on the topic. Without it, a send is a compile error.
TOPIC <- value; is the send. It’s a statement, not an expression — it produces no value, like Erlang’s Pid ! Msg.

Subscribing is declarative — there’s no subscribe() call at runtime. Registration happens when the locus is constructed, and unsubscribe happens automatically at dissolve.

One ordering rule

A subscriber must be born before a publisher sends, or the message has nowhere to land. In practice: instantiate your subscribers first in main. (This is the same rule you saw with the log sink.)

Why this doesn’t break the tower

In the parent/child model, flow is strictly vertical — a locus only talks up to its parent and down to its children. The bus seems to let unrelated loci talk sideways. It doesn’t, really: publishers and subscribers don’t see each other, they see the topic, which lives at the runtime root — structurally above everyone. Every send goes up to the bus; every delivery comes down to a subscriber. It’s vertical flow through a shared root, which is why two loci on opposite branches of a deep tree can coordinate with no shared pointer and no registry lookup.

This is the productive shape for events: many-to-many flow without back-channels. A topic can have any number of publishers and subscribers.

You won’t always pay for it

If a topic is only ever used inside a single locus type — the same locus both publishes and subscribes, with no external binding — the compiler can prove every send routes back to a handler on the same instance, and rewrites the send into a direct method call. The bus is elided entirely. So you can use topics freely for a locus’s own internal event flow without paying dispatch cost; if the topic later grows a second subscriber or a deployment binding, the real bus path comes back automatically, and your code doesn’t change.

As of v0.9.0 the static-dispatch devirtualization is broader than that intra-locus-type case: any quiet, flat-payload, same-thread handler on a closed-world local subject lowers to a direct synchronous call — even when the publisher and subscriber are distinct locus types.

Routing keys: one topic, sharded by a field

By default every subscriber to a topic sees every message. When you have many subscribers that each care about one slice of the traffic — one connection, one symbol, one tenant — fanning every message to all of them and filtering in each handler is wasteful. A routing key moves that filter into the bus: a subscriber declares which key it wants, and the runtime only delivers matching messages.

Name a payload field as the key on the topic, then filter on it at the subscribe site:

type Tick { symbol_id: Int; price: Decimal; }

topic Quote { payload: Tick; keyed_by symbol_id; }

locus Feed {
    params { symbol_id: Int = 0; }

    bus {
        subscribe Quote as on_quote where key == self.symbol_id;
    }

    fn on_quote(t: Tick) {
        // only ticks whose symbol_id matches this Feed arrive
    }
}

A publish carries its key in the payload, so the send is unchanged — Quote <- Tick { symbol_id: 7, price: 100.0d }; reaches only the Feed instances that subscribed with where key == 7.

keyed_by FIELD on the topic picks the routing field. It must be a field of the payload, and its type must be one the bus can hash to a fixed-width key: Int, Bool, Time, Duration, a no-payload enum, or Decimal. (Need a compound key like (symbol, venue)? Pack it into one Decimal field yourself.)
where key == EXPR on a subscribe filters that subscriber. EXPR can be a literal, a const, or self.<field> — the common case, one instance per shard.
The key is captured by value when the locus is constructed. Reassigning self.symbol_id later does not re-route the subscription; to change shards, dissolve the locus and instantiate a fresh one.

When nothing matches

A keyed publish whose key matches no subscriber is governed by the topic’s on_unmatched: policy:

topic Quote { payload: Tick; keyed_by symbol_id; on_unmatched: fallback; }

swallow (the default) — the message is dropped silently. Run with LOTUS_BUS_LOG_UNMATCHED=1 to log drops while debugging.
fail — the publish becomes fallible; every send site must dispose of it: Quote <- t or raise; panics on an unmatched key, Quote <- t or discard; swallows it. Use this when an unrouted message is a bug, not an expected case.
fallback — an unmatched message is delivered to a catch-all subscriber that opts in with where key == _. At least one such subscriber must exist program-wide, or the topic is rejected at compile time.

Next: where loci actually run — Concurrency & placement.

Concurrency & placement

Coming from Go? Concurrency isn’t go f() scattered through the code. Loci run concurrently by default; where each one runs — a shared cooperative pool (like a scheduler’s worker) or its own dedicated OS thread — is declared in one place, the placement { } block on main. It’s a deployment decision, not something baked into the locus. And there’s no async/await: the lifecycle and the bus already give you what coloring functions would.

Two ways a locus can run

Hale’s concurrency is deliberately bimodal — two choices, no third:

Cooperative — the locus shares an OS thread with other cooperative loci on the same pool. It yields between units of work (after a handler, on a bus dispatch, on time::sleep, on an explicit yield). Handler bodies run to completion without interruption, so within one cooperative locus there’s no data race to worry about. This is the default.
Pinned — the locus owns its own OS thread and doesn’t yield to neighbors. For latency-critical or CPU-bound work that shouldn’t share.

Long sleeps don’t freeze the pool

A cooperative pool runs one locus at a time, so a locus that sits in a long time::sleep could, in principle, starve every other locus sharing its pool — a 30-second keep-alive timer on the main pool would block bus handlers for 30 seconds. It doesn’t. std::time::sleep slices any sleep into short intervals (≤100ms) and drains the pool’s pending bus work between slices, so neighbors keep getting dispatched while one locus naps:

run() {
    while true {
        self.send_heartbeat();
        std::time::sleep(30s);   // sliced — co-resident handlers
                                 // still fire every ≤100ms
    }
}

The sleeping locus still wakes after the full duration; it just doesn’t hold the thread hostage in the meantime. You write sleep(30s) and the slicing is invisible — there’s nothing to opt into. (A pinned locus owns its thread, so its sleeps affect no one and aren’t sliced.)

Placement lives on `main`

You declare placement once, against the top-level loci, in main:

main locus App {
    params {
        gateway: Gateway       = Gateway { };
        metrics: MetricsServer = MetricsServer { port: 9100 };
        ui:      Renderer      = Renderer { };
    }
    placement {
        gateway: pinned(core = 1);          // own thread, pinned to core 1
        metrics: cooperative(pool = io);    // shares the "io" pool
        ui:      cooperative(pool = render);
        // anything unlisted defaults to cooperative(pool = main)
    }
}

cooperative(pool = X) puts the locus on pool X’s thread. The runtime spawns one OS worker per pool name it sees.
pinned / pinned(core = N) gives the locus its own thread, optionally pinned to a CPU core.
Unmentioned top-level loci default to cooperative(pool = main) — the program’s main thread.

Placement keys on the field name, not the locus type, so two instances of the same locus type can live on different threads — the parallelism case (one gateway per core, say).

Why on main and not on the locus? Because where something runs is a property of the deployment, not the code. The same Gateway locus is pinned in production and cooperative in a test, with no edit to Gateway itself. Library authors say what a locus is; the binary author says where it runs.

Nested loci inherit their pool

Placement entries apply only to top-level main loci. A locus instantiated inside another locus’s body runs on its parent’s pool. To put a component on its own pool, hoist it to a top-level sibling in main and give it a placement entry. (This is the canonical fix for “my long-running child starved its parent” — make it a sibling, not a nested child.)

This inheritance is also how you co-locate work on a pinned thread. There’s no pinned(pool = X) for sharing a pinned thread — pinned owns its thread exclusively. So when a pinned locus needs helpers on its thread (counters, a metrics registry, a signal store — anything it calls directly), you nest them: make them params of the pinned locus, and they inherit its thread. Param defaults make this ergonomic — a default can itself instantiate the helper:

locus Gateway {              // placed pinned in main
    params {
        reg:   Registry = Registry { };
        ticks: metrics::Counter = metrics::counter(self.reg, "ticks");
    }
    // run() calls self.ticks.inc() etc. — all on the pinned thread
}

Hoisting them to siblings instead would put them on a different thread, and the gateway calling them directly would then be a cross-pool method call — which the compiler rejects (see below). Nesting is the supported pattern for “many loci, one pinned thread.”

The bus crosses threads for you

When a cooperative locus on one pool publishes to a subscriber on another pool — or to a pinned locus on its own thread — the runtime handles the hand-off: it copies the payload across the thread boundary and wakes the destination. The sender never blocks. From your code’s point of view, Topic <- value; is the same line whether the subscriber is on the same thread or a different one. The substrate adapts; the source doesn’t.

High-concurrency I/O: `where async_io`

A single pinned thread handles one blocking connection at a time. To serve many concurrent connections on one thread without a thread-per-connection explosion, tag a cooperative pool with where async_io:

placement {
    workers: cooperative(pool = ws) where async_io;
}

The pool’s worker runs an event loop (epoll under the hood), and blocking I/O calls inside loci on that pool — recv, accept, send — park and resume instead of holding the thread. Your locus code stays synchronous-shaped: stream.recv(4096) is the same call either way; the substrate picks the parking lowering at the syscall boundary. This is how you get async-style throughput without async-style function coloring.

The compiler checks your placement

Two placement mistakes are caught for you, because both the placement and the locus’s shape are known at compile time:

A subscriber that blocks its own delivery is an error. A cooperative locus on a non-main pool receives bus cells fine as long as its pool thread is free to run the dispatch — an event-driven subscriber (handlers plus a sleep loop, or where async_io) works. But if such a subscriber’s run() makes a blocking call, it monopolizes the pool thread, the dispatch never runs, and its handlers never fire. That combination — non-main cooperative subscriber with a blocking run() — is the error; the compiler points you at pinned (own thread + mailbox) or keeping run() non-blocking. (Placement alone is fine; it’s the blocking call that kills delivery.)
A blocking call on a cooperative pool is a warning. Even when the locus isn’t a subscriber, a blocking run() (a blocking recv/accept, a subprocess run) on a pool that isn’t where async_io holds the pool’s thread and stalls everything else scheduled there. The compiler warns and suggests pinned (own thread) or where async_io (parks). For blocking I/O gateways, pinned is the prescribed shape. This warning follows the call graph: a run() that blocks indirectly — through a helper fn or a self.method it calls — is flagged too, naming the offending call. (The dead-receiver error above stays direct-call-only, so it never widens onto an indirect path.)
An orphan bus topic is a warning. In a complete program (one with a main locus), a topic or subject wired to only one end — published with nobody subscribed, or subscribed with nobody publishing — is flagged, as is a declared topic used by neither. It’s suppressed when the other end is plausibly external: a transport binding, a wildcard (log.**) covering the subject, a cross-seed (alias::Topic) reference, or the same locus being both ends. Library code (no main) isn’t checked — its peers live downstream.
A bus cycle is flagged. If a handler for one topic publishes another in a loop (a → b → a), the cell can re-trigger its own publish. A cycle across loci spins the cooperative queue — a warning. A cycle within one locus is worse: intra-locus publishes are direct synchronous calls, so the loop recurses on the thread until the stack overflows — an error. (Only an unconditional self-republish errors; one guarded by an if is a terminating state machine and is left alone.)
An unthrottled publish loop is a warning. A while true loop that publishes with no yield, time::sleep/tick, input-pacing recv, or break/return floods the bus — the producer has no backpressure, so cells pile up without bound. Pace the loop, drive it from an input, or yield to let the subscriber drain. (Bounded loops are never flagged; any flow-control point clears it.)
A subject payload type-mismatch is an error. If two sites publish/subscribe the same literal subject string with different of type payloads, a subscriber would decode the wrong type at runtime — rejected. (Declared topics are already unified by their declaration, so this only affects ad-hoc literal subjects.)

It also enforces the single-threaded-method invariant: a locus’s methods may only be called on the thread that owns its pool, so a direct method call across pools (self.other.foo() where other is placed on a different pool) is a compile error — it would run other’s method on the wrong thread.

One escape is deliberately not traced: a call made through a handler function pointer rather than a direct method reference — the canonical case being a std::http::Server handler that reads a locus living on another pool. The static call-graph walk can’t see through the pointer, so it’s allowed. That’s load-bearing (it’s how a /metrics endpoint on the io pool reads a registry nested on a pinned gateway), but it’s on you to keep that access safe — typically a read of stable, append-only state, not a mutation that would race the owning thread.

Next: how loci nest and own each other — Parents & children.

Parents & children

Coming from Go? This is structured concurrency — closer to an errgroup or a supervised tree than to bare goroutines. A parent locus accepts child loci; the children live inside the parent’s scope, the parent sees their progress through a typed contract, and when the parent shuts down its children shut down first. No detached goroutine outliving the thing that spawned it.

A parent accepts children

A locus declares it can parent a child type by implementing accept:

locus GameSession {
    params { players: [Player]; tick: Int = 0; }
}

locus Room {
    accept(g: GameSession) {
        // runs before g's region is allocated — the gatekeeper.
        // return normally to admit; route through on_failure to reject.
    }

    fn on_join(p: Player) {
        // instantiating a child inside a parent method attaches it
        GameSession { players: [p] };
    }
}

When GameSession { ... } is evaluated inside Room’s body, the runtime runs Room.accept(g) first, then allocates the child’s region inside the parent’s, then births and runs it. The parent’s self.children holds its accepted children (with self.children.count and self.children.is_empty for quick summaries).

Bubbling: the nearest accepting ancestor collects the child

accept isn’t limited to direct children. If you instantiate a child where the enclosing locus doesn’t accept its type, the child doesn’t become a detached throwaway — it bubbles up to the nearest ancestor that does accept it.

locus World {
    accept(s: Ship) { }          // a top-level registry of ships
}

locus Fleet {
    fn spawn() {
        Ship { hull: 100 };      // Fleet doesn't accept Ship...
    }                            // ...so this Ship bubbles to World
}

World collects every Ship spawned anywhere beneath it — through a Fleet that never mentions ships — with no manual registration. It’s the structural counterpart to the bus: the bus carries ephemeral messages; this carries ephemeral ownership — a live collection the ancestor holds and cleans up.

A few rules keep it predictable:

Nearest wins. If several ancestors accept the type, the innermost one gets the child. A direct parent that accepts it is the nearest of all — so nothing about ordinary parent/child attachment changes; bubbling only fills the gap where a child had no owner.
No owner is fine. A child whose type no ancestor accepts is just a transient local — bubbling is opt-in via accept, and the absence of an owner is never an error.
Still vertical. Bubbling travels up the tower to an ancestor; it never reaches sideways. The child’s region still lives inside its owner’s, so the whole “flow is vertical only” cleanup story holds — the owner is just possibly a grandparent, not always the direct parent.

When the owner lives on a different thread — a main locus registry collecting entities that workers spawn on their own pools — the child is created over on the owner’s thread, so the spawning side can’t hold onto it. There a cross-pool spawn is fire-and-forget: write it as a bare statement, not let s = Ship { ... }. The compiler will tell you if you try to keep the value.

The contract: what crosses the boundary

A child decides what its parent may see by declaring a contract:

locus GameSession {
    params { tick: Int = 0; state: SessionState; }
    contract {
        expose tick: Int;          // parent may read this
        expose state: SessionState;
        consume clock: Time;       // parent must provide this
    }
}

locus Room {
    contract { consume clock: Time; }
    accept(g: GameSession) {
        if g.tick > 1000 { /* ... */ }     // reading an exposed field
    }
}

expose is what the child lets the parent read; consume is what the child needs the parent to provide. Anything not in the contract is invisible across the boundary — the compiler rejects reads of un-exposed fields. You don’t write hiding logic; the structural boundary does it.

Flow is vertical only

The rule the whole tower rests on: a locus talks up to its parent and down to its children — never sideways. Two sibling sessions don’t reference each other; if they need to coordinate, they route through their shared parent (the Room is exactly the place that should know how sessions relate), or over the bus. No sibling pointer, no cousin back-channel.

This is what makes cleanup sound: a child’s memory is a sub-region of its parent’s, no pointer ever crosses sideways, so when a locus dissolves its whole subtree frees wholesale — no garbage collector, no per-object bookkeeping.

Flow children vs residents

Here’s the piece that matters for any long-running parent — a server that accepts one child per connection. By default an accepted child lives until its parent dissolves. For a daemon whose parent never dissolves, that means per-connection children pile up forever. Two shapes fix it:

locus Conn {
    params { conn_fd: Int = -1; }
    run() {
        let stream = std::io::tcp::Stream { conn_fd: self.conn_fd, owns_fd: false };
        loop {
            let chunk = stream.recv(4096);
            if len(chunk) == 0 { return; }   // client closed → run() ends
            // ... handle chunk
        }
    }
}

locus Server {
    accept(c: Conn)  { }
    release(c: Conn) { }   // ← declaring release marks Conn a *flow*
}

Declaring release(c: Conn) on the parent marks Conn a flow: its run() is its lifetime. When run() returns (the recv loop ends on close), the runtime reclaims the child right then — drains it, calls the parent’s release for a final look, dissolves it, frees its region — while the server keeps running. The connection’s memory ends with the connection.
A child no parent releases is a resident: its run() returning means “ready,” and it lives until the parent dissolves. That’s the right shape for a fixed cohort of long-lived workers spun up at boot.
A locus can also end itself early with terminate; — the locus analogue of return. It exits the method and lets the runtime tear the locus down.

The same “run() returned” event means “reclaim me” for a flow and “I’m ready” for a resident — disambiguated by whether the parent declared release, never guessed. If you accept a child per connection and memory climbs with connection count, you have a resident that should be a flow.

Next: what happens when a child breaks — When things fail.

When things fail

Coming from Go? This is the part that’s more Erlang than Go. Alongside the value-level fallible channel you already know, a long-running locus has a structural failure channel: when an invariant it promised to keep breaks, the failure flows up to its parent, which decides recovery — restart, quarantine, or escalate. Supervisors, let-it-crash, and typed recovery policy, built into the language.

Two channels, on purpose

Hale keeps two failure mechanisms strictly separate:

The value channel — fallible(E) + or, from the basics. “This call didn’t produce a value; the caller decides what to do.” Routes up the call stack, addressed inline.
The structural channel — a locus’s declared invariant breaks, the runtime builds a typed event and routes it up the locus tower to the parent’s on_failure. “A promised property no longer holds; the supervisor decides.”

There’s no panic, no assert, no exceptions. Every legitimate failure is one of these two, and they only meet at the program’s root.

Declaring an invariant: `closure`

A closure is a property a locus promises to keep, checked by the runtime at a declared moment:

locus Account {
    params { debits: Decimal = 0.00d; credits: Decimal = 0.00d; }

    closure balanced {
        self.debits ~~ self.credits within 0.01d;
        epoch tick;
    }
}

~~ is “approximately equal, within tolerance.” The epoch says when to check — tick (each event-loop iteration), birth, dissolve, duration(1m), or inline (only when fired by hand). If the assertion holds, nothing happens; closures are silent on success. If it breaks, the runtime constructs a typed ClosureViolation and routes it to the parent’s on_failure.

Handling failure: `on_failure`

The parent is the supervisor. It decides policy per child type:

locus Bank {
    accept(a: Account) { }

    on_failure(a: Account, err: Error) {
        match err {
            Error::ClosureViolation(v) -> quarantine(a) for 60s,
            _                          -> bubble(err),
        }
    }
}

The recovery primitives:

absorb — just return; the failure is noted and contained.
restart(child) — dissolve and re-create it fresh.
restart_in_place(child) — reset it, keeping its region.
quarantine(child) for d — pause it, preserving state for inspection, optionally auto-restarting after d.
bubble(err) — pass it up to this locus’s parent.
dissolve(child) — force it down.

If a failure bubbles past the root with no one absorbing it, the process exits non-zero with a structured report. That’s the only way a Hale program “crashes” — and it’s a deliberate, typed event, not a surprise. This is Erlang’s let-it-crash, but the recovery policy is typed and written next to the locus it governs.

Crossing from value to structural

Sometimes a method catches a value-level error and decides it’s fatal — the right move is to stop this locus and let the supervisor take over. You bridge with an inline closure and the violate statement:

locus DbConnection {
    params { last_error: String = ""; }

    closure fatal_io { captures: last_error; epoch inline; }

    // an error-check fn: takes the error, returns the success type,
    // and either substitutes a value or escalates.
    fn handle_io(e: IoError) -> Row {
        self.last_error = e.kind;
        if e.kind == "broken_pipe" {
            violate fatal_io;        // diverges — escalate structurally
        }
        return Row { data: "" };     // transient — substitute and continue
    }

    fn on_query(q: Query) {
        let r = send_query(self.conn_fd, q) or self.handle_io(err);
        if !self.draining { QueryResult <- r; }
    }
}

closure fatal_io { ... epoch inline; } is a named structural failure with no assertion — it only fires when you say so. The captures: clause snapshots locus state into the violation payload.
violate fatal_io; fires it. It’s divergent (the Never type, like fail and bubble), so the branches that violate need no return. The locus enters drain at the next yield; the parent’s on_failure gets the typed violation with the captured state.
self.draining is a Bool every locus can read — true once it’s decided to wind down. Use it to stop publishing after the decision.

That’s the canonical “catch an error and shut this locus down” shape: one closure, one error-check method, one violate. You don’t reach for a hand-rolled should_exit flag and a polling loop — these primitives are the supported form.

Next: splitting a program across processes — Across binaries.

Across binaries

Coming from Go? Splitting a program into services usually means rewriting in-process calls as RPC or queue clients. In Hale the publisher and subscriber code doesn’t change — a topic that was an in-process queue becomes a Unix socket or a broker by adding one line to main’s bindings { } block. The deployment seam is the only place that knows.

A topic is in-process by default

When a topic isn’t mentioned in any bindings { } block, it’s delivered by an in-process cooperative queue. Two loci in the same binary just talk. Nothing to configure.

Binding a topic to a transport

To carry a topic between binaries, name it in the main locus’s bindings { } block with a transport:

main locus App {
    bindings {
        MatchReady: unix("/tmp/matches.sock");
    }
    run() {
        Matchmaker { target_size: 4 };
    }
}

bindings { } is legal only on a main locus. The publisher’s MatchReady <- info; and the subscriber’s subscribe MatchReady as ... are unchanged — they don’t know or care that delivery now crosses a socket. The same locus source runs in a test (in-memory), a single binary (in-memory), and a multi-binary deployment (unix), chosen entirely at this seam.

The transports that ship

In-process — the default; absence of a binding.
unix("/path") — an AF_UNIX framed-byte transport, owned by the runtime. The role (listen vs connect) is inferred from whether the binary publishes or subscribes the topic; specify role: listen | connect when one binary does both.
udp://host:port — datagram transport, including IPv4 multicast. Lossy by nature — right for tick streams and telemetry where stale-is-worthless.
A user adapter — any locus you write that satisfies the __StdBusAdapter interface (a single send(subject, bytes) method). This is how NATS, MQTT, a raw-TCP framing, or a custom JSON-over-WebSocket transport plug in — as ordinary loci in your code, not language features:
```
bindings {
    BrokerEvt: MyNatsAdapter { url: "nats://prod:4222" };
}
```

The substrate stays neutral on protocol semantics — reliability, ordering, retries, backpressure all live in the adapter body, where they belong.

Talking to other languages: codecs

By default the bus uses Hale’s internal wire format, which is fine Hale-to-Hale but opaque to a consumer in another language. When you need JSON over a socket or protobuf to a Python peer, a binding names a codec — a locus that owns encode/decode:

bindings {
    Tick: unix("/tmp/ticks.sock") codec(TickJsonCodec { });
}

The codec is structurally typed against the topic’s payload (encode takes the payload type, decode returns it) and must be pure — no hidden state — because it runs on transport threads. Different bindings on the same topic can carry different codecs; the publisher’s send site doesn’t know which.

The shape this gives you

A single source tree, decomposed into loci that coordinate over topics. How those topics are delivered — same process, same machine over a socket, across the network via a broker — is a deployment decision living in bindings { }, separate from the logic. You design the system once and deploy it many ways. The systems tier adds one more transport for the highest-frequency same-machine routes: shared-memory zero-copy.

That’s the services tier: lifecycle, a typed bus, concurrency and placement, supervised parent/child trees, structural failure, and multi-binary deployment. You can build daemons, servers, and distributed systems with this. The final tier goes under the runtime — memory, layout, raw performance, and the C boundary — for when you need that control.

Next: Memory & lifetime.

Composition patterns

The shape catalog names the six building blocks — app locus, namespace lotus, service locus, spawned child, shape type, free fn. This chapter is the next layer up: five compositions of those blocks that recur in real Hale services, distilled from production use. Reach for one of these when a problem feels like it needs a new language feature — usually it doesn’t, it needs one of these shapes.

1. The three-locus gateway

The canonical answer to “I have N dynamic, keyed children with their own lifecycles” (and to the rejection of putting loci in a hashmap):

pinned reader  ──▶  cooperative manager  ──▶  keyed per-entity child
(owns the fd,        (accept()s a child       (subscribe ... where
 publishes events)    per new key)             key == self.id)

A pinned locus owns the blocking input (socket, ring) on its own thread and publishes decoded events onto the bus.
A cooperative manager subscribes to “new entity” events and accept()s one child per key. Declare release(c: Child) so each child is reclaimed when its flow ends (otherwise it’s a resident and lives until the manager dissolves — unbounded on a daemon).
Each child subscribes with a key filter (subscribe Update as on_update where key == self.id) so the bus routes only its own entity’s messages to it.

This gives you per-entity state and lifecycle without a map of loci — the bus is the routing table, keyed.

2. Demand-driven discovery

A special case of the gateway with zero hardcoded topology: the manager doesn’t know its children up front. A subscription triggers the accept():

// manager
bus { subscribe "entity.first_seen" as on_seen of type Seen; }
fn on_seen(s: Seen) {
    // First message for this key → spawn its child now.
    // Bare instantiation inside a parent method attaches the child:
    // it triggers the enclosing accept(c) gatekeeper. `accept` is a
    // lifecycle hook the runtime invokes, never a method you call.
    Child { id: s.id };
}

The topology grows from the data. Combined with release, children appear on first contact and vanish when their flow ends — the process shape mirrors the live workload with no configuration. (If the manager doesn’t itself accept this child type, the child bubbles to the nearest accepting ancestor — v0.9.2.)

3. Hot-path counters & gauges (and the CQRS rejection)

You will want to write let n = self.metrics.incr("hits") on a hot path. Hale rejects locus methods that return locus values (GH #18.6 / the “CQRS” shape) — a method call that hands back a live locus reference breaks the closed-world ownership the substrate relies on. The rejection without a replacement strands you, so here is the migration:

Pre-allocated handles at boot. Declare the counter/gauge loci as params of the owner, instantiated once at birth. The hot path mutates a field in place (self.hits = self.hits + 1) — no method returning a locus, no per-call allocation.
Bus-routed single-writer store. For shared metrics, publish a MetricUpdate { name, delta } to a single collector locus that owns the store and applies updates in its handler. One writer, no contention, and the closed-world rewrite keeps the publish synchronous. This is the shape pond/metrics’ MetricsCollector uses.

Either way the hot path does an in-place field write or a publish — never a method that returns a locus.

4. The publish-policy gate

When you produce data faster than you want to publish it (telemetry, book snapshots), gate the publish behind a tick() with a time-or-volume trigger rather than publishing per-update:

fn on_update(u: Update) {
    self.pending = self.pending + 1;
    self.acc = self.acc + u.delta;          // accumulate in place
    if self.pending >= 100 { self.flush(); } // volume trigger
}
fn tick() {                                  // time trigger (scheduled)
    if self.pending > 0 { self.flush(); }
}
fn flush() {
    "snapshot" <- Snapshot { total: self.acc };
    self.pending = 0;
}

The accumulation is in-place; only the flush crosses the bus. This keeps the high-frequency path allocation-free and bounds publish volume independently of input volume.

5. View lifetime — copy out to persist

The zero-copy span/JSON APIs (StringView, BytesView, std::json::*_span) hand you a view into a buffer you don’t own. That view is valid only until the next operation that overwrites the buffer — the next recv, the next ring read. Holding it across that boundary reads freed/overwritten memory:

let name = std::json::find_string_field(msg, "name");  // view into recv buf
self.read_msg();                                       // ← overwrites the buffer
println(name);                                         // ✗ dangling view

The rule: a view is valid until the next recv/overwrite; copy out to persist. Materialize it before the boundary:

let name = std::str::clone(std::json::find_string_field(msg, "name"));
self.read_msg();
println(name);   // ✓ owns its own copy

Forgetting this is now panic-guarded (a stale-view access exits with a diagnostic rather than reading garbage), so you’ll see a clear “view used after its buffer was overwritten” message instead of a silent corruption — but the fix is always to clone out before the overwriting call.

Memory & lifetime

Coming from Rust / C++? No garbage collector, and no borrow checker either. Memory is region-based: every locus owns an arena, allocations inside it are bump-pointer cheap, and the whole region frees in one shot when the locus dissolves. The locus tree is the ownership graph — so lifetimes are structural, not annotated. You never write free, and you never fight a borrow checker, because no pointer ever crosses sideways.

You’ve used loci for pages without thinking about memory, because the model is automatic. Here’s what’s underneath.

A locus owns a region

Every locus has an arena — a region of memory. Everything the locus allocates (strings it builds, records it constructs, collection storage) comes from that arena. When the locus dissolves, the entire region is freed at once. There is no per-object deallocation, ever.

Regions nest exactly like loci do. A child’s region is a sub-region of its parent’s:

  root
  └── App's region
      └── Server's region
          ├── Conn A's region
          └── Conn B's region

When a locus dissolves, its whole subtree of regions frees wholesale. This is why shutdown cascades cleanly and why flow children reclaim per connection: freeing is structural, not traced.

Why no GC and no borrow checker

Both exist to answer one question — when is it safe to free this? Hale answers it structurally instead:

No pointer crosses sideways. Vertical-only flow means a value in one locus’s region is never referenced by a sibling. So when a region frees, nothing dangles into it.
Messages are copies, not pointers. A payload crossing a locus boundary is copied into the receiver’s arena. Sender and receiver have independent lifetimes; the sender can dissolve while the receiver still holds its copy.

With those two invariants, wholesale-free-at-dissolve is sound with no tracing and no aliasing analysis. The discipline the borrow checker enforces with annotations, Hale enforces with structure — you got it for free by building a locus tree.

Bounded storage: capacity slots

The arena is for transient, locus-lifetime allocation. When a locus needs bounded, disciplined storage — a recycling pool, a growable buffer — it declares capacity slots:

locus Router {
    capacity {
        heap routes  of Route;     // growable, individually freed
        pool sessions of Session;  // fixed-shape, recyclable cells
    }
}

heap X of T — growable storage, cells allocated and freed individually during the locus’s life, the whole slot reclaimed at dissolve.
pool Y of T — a bounded population of fixed-shape, recyclable cells (acquire / release).

The forms you’ve been using — @form(vec), @form(hashmap) — are built on exactly these slots; the form annotation just synthesizes the method surface over them. And for a list that belongs inside a value rather than on a locus, there’s bounded[T; N] (see Collections) — fixed-capacity, laid out inline, whole-struct copies carry it, and the memory-bound analysis treats it as bounded by construction. Slots hold values, never locus references: locus membership goes through accept, not storage.

Projection classes: committing to resolution

When a parent has many children, you can commit up front to the resolution at which it observes them — which lets the compiler pick the allocator that makes that resolution cheap:

locus WorkerPool : projection chunked {
    accept(w: Worker) { }
}

rich — a handful of named children (≈4–10), each fully observed. Per-child arenas, low churn.
chunked — moderate counts (≈10–30), observed in ranges. Per-child sub-regions with free-list reuse — the default when a locus accepts children.
recognition — large populations (≈100–500), observed in aggregate (a count, a histogram). Pre-allocated fixed pools.

The projection class changes the allocator strategy, not your code: the same parent and child methods read from a rich pool or a recognition pool unchanged. It’s a commitment about observation resolution; the compiler turns that into a layout.

Sizing is hints, lifetime is law

Declared sizes are hints — an arena that out-allocates its budget just adds another chunk; it doesn’t panic. The load-bearing property is lifetime: wholesale free at dissolve. That’s the contract every other guarantee leans on.

Next: keeping a long-running program’s memory flat — Performance.

Performance

Coming from Rust / C++? You’re used to controlling allocation and watching it. Hale’s arena model makes most code allocation-bounded by construction — a per-method scratch region absorbs intermediate allocations and frees them at method exit — but a few patterns can still grow a long-running process. This chapter is the shape of that growth and how to keep it flat.

The default is already bounded

Inside any locus method, a scratch sub-region opens on entry and is destroyed on return. Transient allocations — string concatenations, JSON parsing, format building — land in scratch and are reclaimed when the method returns. Values you persist (self.field = ...) are deep-copied into the locus’s own arena first, so they outlive the scratch. The net effect: a hot run() loop that allocates transiently doesn’t grow the locus’s lifetime arena. You get this without doing anything.

So the question isn’t “how do I free?” — it’s “which patterns defeat the automatic bounding?”

The pattern that bites: accumulating in a loop

fn render(rows: Int) -> String {
    let mut out = "";
    let mut i = 0;
    while i < rows {
        out = out + render_row(i);     // a fresh String each iteration
        i = i + 1;
    }
    return out;
}

Each out + ... allocates a new string; scratch demand peaks at the total size of every intermediate. For large inputs that crosses a chunk boundary. The fix is an accumulator that grows one buffer in place:

fn render(rows: Int) -> String {
    let b = std::bytes::BytesBuilder { };
    let mut i = 0;
    while i < rows {
        b.append(std::bytes::from_string(render_row(i)));
        i = i + 1;
    }
    return std::str::from_bytes(b.finish());
}

BytesBuilder is the canonical accumulator — one extensible buffer instead of N throwaway strings. Use it (or std::json::Builder for JSON output) anywhere you build a result incrementally.

Resolve string keys to ints at boot

If a hot path looks something up by string key in another locus, the string gets copied on every call. Resolve the key to an Int index once at startup and pass the index on the hot path:

locus Service {
    params { metrics: MetricsRegistry = MetricsRegistry { }; ticks_idx: Int = 0; }
    birth() {
        self.ticks_idx = self.metrics.register("ticks_total");  // clone once
    }
    fn dispatch(m: Msg) {
        self.metrics.inc(self.ticks_idx);                        // zero per-call alloc
    }
}

Reclaim per-connection state

The other place growth hides is a daemon that accepts a child per connection. If those children are residents, their regions live until the (never-dissolving) parent does, and memory climbs with connection count. Make them flows — declare release(c: Conn) on the parent — so each child’s region is reclaimed when its connection ends. If RSS tracks connection count, this is almost always why.

Catching it at compile time

The growth patterns above — a per-message handler that allocates into self, a connection child left resident — have a static shape, and hale check flags them before you ever measure RSS. These are advisory warnings, not build failures:

hale check app.hl flags (by default — no flag needed) an allocation that accumulates without bound: a struct / array / bytes value created in a per-message bus handler (or a runtime-bounded loop) that escapes into self, where it lives until the locus dissolves — e.g. a whole-value replace self.latest = Thing{…}, which bump-allocates a fresh value each message. The fix is usually in-place mutation (self.latest.field = v, self.arr[i] = v) instead of replacing the whole value, or the moves from this chapter — a capacity-bounded @form, route it over the bus, or a per-iteration child. A while i < N { … } counter with a constant bound is proven bounded and left alone. Run-to-exit programs (a main with no run loop and no bus handler) are exempt automatically — a script that allocates and exits owes nothing. Opt out of a run with --no-warn-unbounded-alloc. Annotating a long-lived locus @bounded is now redundant with the default — the check already runs on every hale check — but it’s still accepted. Use @unbounded (on a fn or a lifecycle hook) to acknowledge an intentional accumulation and silence it.
The same check flags an insert into a growing collection — v.push(x) / m.set(x) where v / m is a @form(vec) or @form(hashmap) — when it runs in an unbounded context. The backing buffer grows with population and frees only at dissolve, so a push per message accumulates. A @form(ring_buffer) / @form(lru_cache) is cap-bounded and never flagged; switching to one (or bounding the loop) is the fix. (Detection reads the receiver’s declared type, so it sees fn f(v: IntVec) and self.buf: IntVec but not an untyped let.)
hale check app.hl --warn-resource-leak is the same idea for file descriptors: an open / connect / accept whose result is stored resident in an unbounded context, so fds pile up.

For the resource surface — thread / pool / subject / fd counts, not a leak — there’s a budget you can read or gate on:

hale check app.hl --dump-resource-budget
# OS threads (pinned loci):  1
# cooperative pools:         1  [io]
# bus subjects:              4
# fd acquisition sites:      2

Drop a ceiling file in CI and the build fails when a count climbs past it — “this PR added a pinned thread; bump the ceiling if you meant to.” Every key is optional:

# budget.toml
pinned_threads = 4
bus_subjects   = 16

hale check app.hl --check-resource-budget budget.toml

None of these run by default — they’re tools you reach for when a program’s memory or fd surface is something you want to hold the line on.

Knobs for when it’s not your code

The substrate exposes diagnostics and glibc tuning via environment variables — LOTUS_ARENA_RESIDENCY=1 to dump live arena sizes from a heartbeat, LOTUS_ARENA_LOG_CHUNK_ATTACH=N to trace which arena is growing, LOTUS_CHUNK_POOL_STATS=1 for chunk-pool hit rates, and the MALLOC_* family for glibc’s trim/arena behavior. The full table is in spec/memory.md and the keeping memory bounded spec material. The workflow: smaps-diff over a window → if it’s [heap], check 30s deltas → bursty 64KB steps mean chunk-pool overflow (a loop accumulator) → fix with BytesBuilder.

Hot-path I/O primitives

For latency-sensitive sockets, the stdlib exposes the knobs you’d reach for in C, without an FFI shim:

Disable Nagle — std::io::tcp::set_nodelay(fd, true) (and the std::io::tls sibling) so small writes hit the wire immediately instead of waiting ~40ms to coalesce. The first thing a request/response or market-data socket wants.
Wire-arrival timestamps — recv_stamped_into is recv_into plus a kernel RX timestamp captured in the same recvmsg; read it with last_recv_kernel_ns() right after. True wire time, not the post-scheduling receipt clock — for measuring real I/O latency.
Wrap-free parsing — std::io::MirrorRing double-maps a buffer so any window is one contiguous slice even across the wrap point; a stream parser never special-cases the seam. Opt-in (it costs 2× address space) — for the ordinary case a BytesBuilder accumulator is the right tool.

And the run-time complement to the compile-time --warn-unbounded-alloc check: std::diag::heap_alloc_count() and std::diag::syscall_count(name) let a test assert a steady-state region did what you think — read the counter before and after and check the delta is zero (“this loop allocated nothing”, “exactly one recv per poll”).

Build-time tuning

hale build already tunes to the machine you build on: native builds compile for the host CPU at O3, so generated code autovectorizes to whatever the host supports (AVX2, AVX-512, …). Two knobs matter when that default isn’t what you want:

--target-cpu baseline — pins a portable x86-64-v3 target (AVX2 + BMI2 + FMA) instead of the host. Reach for this when you ship a binary to other machines: the default host-tuned build may use instructions an older CPU lacks. --target-cpu native (the default) is right for hale run and for binaries you execute on the build host (e.g. a service on hardware you control).
LOTUS_LTO=1 — an opt-in full-LTO build that inlines the lotus runtime (the arena allocator, string helpers, shm ring) into your code across the compile boundary it otherwise can’t cross. A few percent on allocation- and coordination-heavy programs — exactly the shape Hale is built for — and it keeps the host vectorization, so there’s no loop it slows down. It’s off by default because the link is ~3–4× slower and needs lld on PATH; turn it on for release/perf builds, not the edit-compile loop:
```
LOTUS_LTO=1 hale build myservice/
```

Where Hale earns its overhead

Hale is shaped to pay coordination cost well — bus dispatch, region setup, lifecycle — and as of v0.9.0 that’s where it leads. The lock-free bus plus static-dispatch devirtualization turned coordination from a deficit into an advantage over Go: bus_dispatch went from ~4× behind to 2.4× ahead, and bus_dispatch_cross_pool from behind to 1.26× ahead. Reach for Hale’s structure where the work is coordination-shaped, which is most real systems.

The tight loop caught up too. Pure arithmetic used to be the place the substrate showed through, but native codegen closed the gap: fn_modular reached parity with clang -O3 C (~0.98 of the C time). Coordination is the lead; tight-loop arithmetic is no longer the price you pay for it.

Next: what @form actually compiles to — Forms under the hood.

Forms under the hood

Coming from Rust / C++? A form is closer to a monomorphized template than to a generic collection object. @form(vec) doesn’t wrap a one-size-fits-all container — the compiler emits a tight, type-specialized implementation per cell type, sized and laid out for your element. You declared the access discipline at the everyday level; here’s what it lowers to and how to choose.

A form is a lowering, not a type

When you write:

@form(vec)
locus Names {
    capacity { heap items of String; }
}

the compiler doesn’t reach for a library Vec. It synthesizes, for this locus and this cell type, a contiguous growable buffer and the methods over it — push, get, set, pop, len, is_empty, and the sort family. The storage is the heap capacity slot; the form decides the layout (here: a {cap, len, buf} struct with doubling realloc) and the method surface.

The four forms and what they require:

Form	Backing slot	Lowers to	Synthesized surface
`@form(vec)`	one `heap`	doubling contiguous buffer	`push`, `get`, `set`, `pop`, `len`, `is_empty`, `sort*`
`@form(hashmap)`	one `pool` + `indexed_by`	intrusive open-addressing table	`set`, `get`, `has`, `remove`, `len`, `is_empty`
`@form(ring_buffer, cap=N)`	one `pool`	fixed circular buffer	`push -> Bool`, `pop`, `len`, `is_full`
`@form(lru_cache, cap=N)`	one `pool` + `indexed_by`	fixed keyed table, LRU eviction	`put`, `get`, `contains`, `len`

get / pop / remove are fallible (bounds / missing-key / empty); push on vec is infallible, on ring_buffer returns Bool (full is a normal condition, not an error). lru_cache is the cap-bounded keyed form: put is infallible and silently evicts the least-recently-used entry over cap (a get counts as a use and saves an entry from eviction; contains does not). Its get is fallible(KeyError) on a miss.

Both a vec and a hashmap also expose batched iteration (shipped 2026-07-02) — for x in v.items { … } walks the vec, and for e in m.entries { … } walks the map. The loop is an inline buffer/slot walk, not per-element method calls. (Don’t mutate the form inside the body — a grow would rehash under the cursor.)

By default a @form(hashmap) is single-pool: its densest layout has no synchronization, and a cross-pool call into it is rejected. Opt into concurrent access with the sync = … parameter — @form(hashmap, sync = serialized) (per-map mutex), sync = striped (concurrent readers), or sync = lockfree (CAS-only steady state) — trading layout density for the sharing discipline the workload needs.

The performance contract

Each form commits to a performance band, verified by microbenchmarks in the tree:

Tight-loop primitive (push) — within ~10% of idiomatic C. @form(vec).push hits this.
Amortized workload — within ~2× of the C equivalent.
Per-op fallible (get through the fallible ABI) — no tight bound; advisory, because the fallible return shape and the function-call boundary cost real cycles.

The point: a form isn’t a slow generic that “works for any type.” It’s a specialized implementation monomorphized to your cell type. The cost is that a @form(vec) of Player isn’t interchangeable with some library’s Vec<Player> — there’s no such shared generic. If you want a shared API across forms, you declare an interface.

Choosing a form

Growable, ordered, index access → @form(vec).
Keyed lookup, key is a field of the value → @form(hashmap) (indexed_by names the key field).
Bounded window, drop-or-backpressure on full → @form(ring_buffer, cap = N).
Bounded keyed cache, evict least-recently-used on full → @form(lru_cache, cap = N) (indexed_by names the key field).

One form per locus — a locus is one container. Need two? That’s two loci, which is usually the cleaner decomposition anyway.

Orthogonal to projection class

A form governs how a locus stores cells of a value type. A projection class governs how a parent serves observations of its accepted child loci. They operate on different things and compose freely on the same locus:

@form(hashmap)
locus SessionStore : projection chunked {
    capacity { pool sessions of Session indexed_by id; }
    accept(w: Worker) { }
}

@form(hashmap) lays out the sessions value store; projection chunked sizes the allocator for the accepted Worker children. Different slots, no interference.

Cells are data

A form cell can be a primitive or a type record — never a locus. Storing a locus in a map would mean get(key) hands a live entity to a stranger, the same antipattern the language rejects for methods returning loci. For keyed entities, make them accepted children and key a parallel index by name. Cells are values; entities are children.

Next: the fastest same-machine transport — Zero-copy & the high-frequency bus.

Zero-copy & the high-frequency bus

Coming from Rust / C++? This is the shared-memory ring buffer you’d otherwise build by hand with mmap and atomics. For same-machine routes north of ~100k msg/s — market data, tick streams — the per-message copy at the locus boundary shows up in the latency budget. A shm_ring binding writes the payload straight into a POSIX shared-memory slot the subscriber reads from. No kernel memcpy at the boundary. And it’s still the same subscribe/publish code.

The default copies; sometimes you can’t afford it

Every ordinary bus delivery copies the payload into the subscriber’s arena — that’s what keeps lifetimes independent and the memory model sound (see Memory & lifetime). For the vast majority of topics that copy is free in the noise. For the hottest same-host routes it isn’t, and you opt into a zero-copy path explicitly.

A `shm_ring` binding

In main’s bindings { } block:

main locus App {
    bindings {
        L2Updates: shm_ring("/l2-updates",
                            slot_count:  1024,
                            on_overflow: fail)
                  where intra_machine, zero_copy;
    }
}

Publisher and subscriber mmap the same /dev/shm object and coordinate through the ring’s slot indices. The publisher writes its payload directly into a slot; the subscriber reads from the same memory. No copy crosses the boundary.

The subscribe L2Updates as on_update; handler is the same line of source it would be over a Unix socket — the substrate picks the zero-copy lowering from the binding, not from the locus code.

Per-record vs. batch: the handler’s param picks the mode

By default the substrate calls your handler once per record:

fn on_update(u: Update) {   // per-record
    self.total = self.total + u.px;
}

On a high-rate cross-process feed that per-record call — plus the per-call handler scratch — is exactly the overhead that loses to a bare consumer loop in C or Go. Hale’s fix is the drain handler: change the parameter type to Drain<T> and the substrate calls the handler once per available batch, handing you a handle you consume with a tight inline loop.

locus Agg {
    params { total: Int = 0; }
    bus { subscribe Quotes as on_quotes; }   // SAME subscribe line
    fn on_quotes(feed: Drain<Tick>) {         // param type → batch mode
        for t in feed {                       // zero-copy inline loop
            self.total = self.total + t.px;   // no per-record call
        }
    }
}

There is no new keyword — the subscribe clause is unchanged; the parameter type alone selects the dispatch mode. Inside for t in feed, each t is read straight through the ring slot (so t.px reads the mapped shared memory in place, never a copy), and the consumer cursor advances once per batch instead of once per record.

Drain<T> is only spellable as a batch handler’s parameter and as the thing you iterate; it is not a general value type. Batch handlers on a foreign (layout:) ring aren’t supported yet — use a per-record handler there.

The `where` clause is a checked contract

where intra_machine, zero_copy is two things at once: your assertion about the route, and a contract the compiler validates.

Scope — intra_process, intra_machine, or cross_machine (pick one). zero_copy with cross_machine is rejected: the network always serializes.
Behavior — zero_copy is rejected on transports that can’t honor it (unix(...) memcpies through the socket buffer; user adapters serialize through send(subject, bytes)).

Zero-copy needs a flat payload

A payload you can drop into a shared slot must be flat-shapeable: every leaf is a fixed-layout primitive (Int, Float, Bool, Decimal, Time, Duration), a fixed-size array of those, or a struct whose fields are all flat-shapeable. String, Bytes, and unbounded arrays carry heap pointers that don’t translate to a shared slot, so the compiler rejects them on a zero-copy topic. Use a fixed-size byte array ([Byte; 256]) for bounded text on these routes.

Overflow is your decision

A shm_ring binding must declare on_overflow: — slot exhaustion needs a policy the substrate can’t guess:

block — the publisher spins until a slot frees. Right for control-plane data that must not be lost.
drop — overwrite the next slot; slow consumers miss messages. Right for stale-is-worthless feeds.
fail — panic with a clear diagnostic. Process-level visibility into back-pressure.

Reading someone else’s ring

A shm_ring binding speaks Hale’s own ring format. But sometimes the ring already exists — written by another program in another language, with its own binary layout. Instead of hand-writing FFI or forking the runtime, you declare that layout and point a binding at it:

ring_layout ForeignRing {
    magic 0x52494E47464D5431;        // expected header magic at offset 0
    version 1 at 8 : u32;            // header field `version`, must equal 1
    buffer_size at 12 : u32;         // ring capacity, read from the header
    data_at 128;                     // first record starts here
    cursor published {               // the producer's published byte cursor
        at 64; repr atomic_u64; load acquire; unit bytes;
    }
    framing byte_records {           // records are [u32 length][payload]
        len_prefix u32; align 8; pad_sentinel 0xFFFFFFFF;
    }
    overflow lap_detect;
}

main locus App {
    bindings {
        Ticks: shm_ring("/foreign.ticks", on_overflow: drop,
                        layout: ForeignRing) where zero_copy;
    }
}

A subscriber on Ticks now reads that foreign ring directly: the runtime attaches it read-only, checks the magic and version, and walks the length-prefixed records, handing each payload to your on_tick handler with no copy. Your handler code is identical to any other shm_ring subscriber — the layout only changes how the substrate finds and frames the bytes.

A binding with no layout: keeps Hale’s native ring, so nothing you wrote before changes.

The same binding works the other way too. If a locus in your program publishes the topic, it becomes the ring’s producer: Hale creates the segment, writes the header the layout describes, and frames each Ticks <- Tick { ... } as a length-prefixed record another program (or another language) can read. Give the binding a buffer_size: to size the ring:

Ticks: shm_ring("/foreign.ticks", on_overflow: drop,
                layout: ForeignRing, buffer_size: 65536) where zero_copy;

So the same declared layout lets Hale sit on either side of a foreign ring — consume what another process writes, or produce what another process reads — with the locus body unchanged. Two caveats at this version: a subscriber sees records published after it attaches (no replay of history), and if it falls more than a full buffer behind it resyncs rather than read a torn record.

Mixed record types: a raw `BytesView` payload

The examples above bind a fixed payload struct — every record on the ring is the same shape. Real feeds are often heterogeneous: a header plus one of several record types, selected by a discriminator, with varying length. Bind such a topic to a BytesView payload and the subscriber receives a bounded view over each record to decode itself:

topic Recs { payload: BytesView; }

locus Reader {
    bus { subscribe Recs as on_rec; }
    fn on_rec(v: BytesView) {
        let kind = std::bytes::read_u8(v, 0) or 0;
        match kind {
            1 => { /* decode an L1 record with std::bytes::read_* */ }
            2 => { /* decode an L2 record */ }
            _ => { }
        }
    }
}

No fixed size is assumed (a differently-sized valid record isn’t dropped), and you decode with the std::bytes::read_* pack readers and a discriminator branch. This is the path for reading real external mixed-record rings; the typed-struct binding stays the fast path for a homogeneous ring.

Producing such a ring is symmetric — build a record with a BytesBuilder and send the bytes:

fn emit_l2(level: L2) {
    let b = std::bytes::BytesBuilder { initial_cap: 64 };
    b.append_u8(2);                // discriminator
    b.append_u32_le(level.price);
    b.append_u32_le(level.qty);
    Recs <- b.view();              // framed at its own length
}

Recs <- bytes frames [len_prefix len][bytes] where len is the value’s actual byte length, so each record carries its own size.

Writing in place (zero-copy)

That builds the record in a temporary buffer, then copies it into the ring. To skip the copy on a hot producer path, write the fields directly into the reserved slot:

fn emit_l2(level: L2) {
    Recs.write(24) { w =>           // reserve up to 24 bytes
        std::bytes::write_u8(w, 0, 2)              or raise;
        std::bytes::write_u32_le(w, 1, level.price) or raise;
        std::bytes::write_u32_le(w, 5, level.qty)   or raise;
        9                            // bytes written -> the record length
    };
}

Topic.write(max) { w => ... } reserves up to max bytes, hands the body a writable view w over the slot, and commits the byte count the body’s tail yields. The std::bytes::write_* family mirrors the readers (bounds-checked, fallible(IndexError)). The reserve and commit are scoped to the block, so the view can’t escape and the commit can’t be forgotten.

Naming the fields (`repr:` tags)

Hand-writing read_u32_le(b, 12) per field is error-prone — the offsets are implicit and drift as the record changes. Tag a struct’s fields with their wire representation and the offsets are computed for you, with typed accessors generated from the layout:

type L2 {
    kind:  Int `repr:"u8"`;       // 1 byte  @ 0
    price: Int `repr:"u32_le"`;   // 4 bytes @ 1
    qty:   Int `repr:"u32_le"`;   // 4 bytes @ 5
}

Now the consumer reads fields by name and the producer writes them by name — both compose with everything above:

fn on_rec(v: BytesView) {
    let p = L2::price(v) or raise;       // read u32_le @ 1
    ...
}

fn emit(level: L2) {
    Recs.write(9) { w =>
        L2::set_kind(w, 2)            or raise;
        L2::set_price(w, level.price) or raise;
        L2::set_qty(w, level.qty)     or raise;
        9
    };
}

Type::field(v) and Type::set_field(w, x) desugar to the matching std::bytes::read_* / write_* call at the field’s computed offset — so they’re exactly as cheap (and as bounds-checked) as writing the primitive by hand. Offsets run in declaration order over the tagged fields; pin one for a padded foreign format with repr:"u32_le,at=4". The tag itself is general key:"value" metadata — repr: is the binary-pack consumer; other keys (e.g. json:) are free for later tools.

Per-record headers and wire timestamps

Real external feeds often prefix each record with a small fixed header — a sequence number, a producer-side wire-arrival timestamp — before the variable payload. Declare it in the ring_layout with record_header_bytes (and pad_field for any alignment padding), and the subscriber reads those header fields for the record it’s currently handling through std::shm:

fn on_rec(v: BytesView) {
    let seq = std::shm::last_record_seq();        // header sequence no.
    let wire_ns = std::shm::last_record_kernel_ns(); // producer wire time
    // ... decode v as before ...
}

These read like the errno-style timestamp getters on a socket recv: call them inside the handler, and they describe the record being delivered. Each returns 0 when the layout declares no such field. The layout’s recheck post_copy guard re-validates the header after the copy, so a record torn by a producer lapping the ring is never surfaced with a half-written header. (A native fixed-stride ring uses framing slots instead of length-prefixed byte_records — same layout: machinery, a different framing kind.)

The same shape, one tier down

Notice this is the same move as everything else at this level: an operational requirement (zero-copy delivery) declared at the deployment seam, validated by the compiler, consumed by codegen to pick a lowering — while the locus body stays the synchronous, portable code you wrote three tiers ago. You reach under the hood without rewriting the program.

Next: calling into native libraries — Binding C.

Binding C

Coming from Rust / C++? This is extern "C" with a thin, hand-written wrapper — no bindgen, no build-script codegen. You declare the C symbols you need with @ffi("c"), ship a small glue .c file, and name the link flags in hale.toml. The compiler emits LLVM declares and the linker resolves them. No compiler change is needed to bind a new library.

Declaring a foreign function

An @ffi("c") annotation on a bodiless top-level function declares an external C symbol:

@ffi("c") fn doubler_double(x: Int) -> Int;

fn main() {
    println(doubler_double(21));     // 42
}

The LLVM symbol name is the function name verbatim — no mangling — so the linker matches it directly against your C. Convention: prefix FFI names with the library identifier (raylib_init_window, sqlite3_open) to keep the global C namespace tidy.

Type marshalling

Only a portable subset crosses the boundary; the mapping is fixed:

Hale	C
`Int`	`int64_t`
`Float`	`double`
`Bool`	`int32_t` (0 / 1)
`Duration` / `Time`	`int64_t` (nanoseconds)
`String`	`const char *` (NUL-terminated)
`Bytes`	pointer to `[int64 len][payload]`
user `type`	pointer to a layout-matching struct
`()`	`void` (return only)

Decimal and fixed-size arrays are not portable across FFI — the compiler rejects them at the boundary. Function declarations also can’t be generic or fallible(E); a C function reports errors with a sentinel, and your Hale wrapper translates that sentinel into the fallible channel.

The glue and the build

Write the C side as an ordinary translation unit:

/* glue.c */
#include <stdint.h>
int64_t doubler_double(int64_t x) { return x * 2; }

Build, naming the C source (and any libraries to link):

hale build mydir/ --csrc glue.c
hale build mydir/ --csrc raylib_glue.c --link raylib

For a reusable binding library, declare the surface in hale.toml so consumers don’t pass flags by hand:

[ffi]
csrc = ["glue.c"]
link = ["raylib"]

A downstream project then just imports the binding and builds normally; the FFI flags thread through automatically.

Lifetime rules across the boundary

The boundary is read-only for arena-owned memory, and the rule is simple: the caller owns every pointer; the callee must not retain it past the call. If the C side needs to keep data, it mallocs and copies. If it returns heap data back to Hale, it allocates into the caller’s arena via lotus_arena_alloc(lotus_caller_arena_or_global(), size, align) so the value lives by Hale’s rules. Exceptions / longjmp must not cross the boundary.

This is the whole FFI story — declare, glue, link. The full contract (struct-return sret convention, the exact view layout for BytesView) is in spec/ffi.md. Binding libraries conventionally live in pond; the agents/binding-packages.md brief covers the recommended file layout.

On the wasm target, @ffi("c") has a sibling: @ffi("js") declares a function the JavaScript loader provides instead of a linked C symbol, and @export sends Hale functions out to the host. Same declare-and-bind shape, different boundary — see WebAssembly & the browser.

Next: state that outlives one process — Cross-process & hot-load.

WebAssembly & the browser

Coming from the web stack? Hale compiles to a self-contained .wasm plus a small .mjs loader — no Emscripten, no bundler. The same locus/bus/std::* program you run natively can run in the browser; you choose the target at build time. The browser APIs you can’t reimplement (fetch, WebSocket, WebGL, the DOM) come in as thin host functions, and Hale functions you want the page to call go out as exports.

Building for wasm

hale build client/main.hl --target wasm32

This emits client/main.wasm (self-contained — a tiny bundled libc, no external runtime) and client/main.mjs (a loader that instantiates the module and wires the host functions). The program declares the target so the typechecker can gate the parts of the standard library that need syscalls:

target wasm { }

Under target wasm, the portable stdlib works as usual (std::str, std::bytes, std::json, std::math, …), but the POSIX-backed namespaces (std::io::tcp, std::process, std::http, …) are rejected at typecheck — the browser sandbox has no syscalls. Reach the outside world through host functions instead.

The in-process typed bus — topic / bus { publish … } / bus { subscribe … } across loci — runs under wasm exactly as it does natively: a Subject <- payload is delivered to every matching subscriber’s handler in the same module, payload-copied through the synthesized wire codec. Only the cross-process / network transports (shm_ring, unix, CONNECT-role bindings) are unavailable in the sandbox — those need syscalls. So the idiomatic locus + topic + bus shape is fully available client-side.

The @form collections — @form(vec), @form(hashmap), and @form(ring_buffer) — run under wasm too; their runtime primitives use the target-pointer-width size_t ABI, so a push / set / get / len behaves identically to native.

Calling the host: `@ffi("js")`

@ffi("js") is the wasm sibling of @ffi("c"): it declares a function the JavaScript loader provides.

target wasm { }
@ffi("js") fn console_log(msg: String);
@ffi("js") fn draw_line(x1: Float, y1: Float, z1: Float,
                        x2: Float, y2: Float, z2: Float);

Marshalling: Float and Int both arrive as a plain JS number — an @ffi("js") Int crosses as f64, not a BigInt, so your host handler gets a number with no Number(x) step, and an Int-returning import takes a plain number back. (The one caveat is f64’s range: Ints beyond 2^53 lose precision across this boundary — send those as a String/Bytes payload. And this applies to @ffi("js") only; @ffi("c") keeps i64.) String/Bytes arrive as a pointer the loader reads out of wasm memory. The loader ships a built-in console_log and the libm set (so std::math just works); your page supplies the rest through run(glue):

import { run } from "./main.mjs";
const inst = await run((h) => ({
  draw_line: (x1,y1,z1,x2,y2,z2) => { /* push to a WebGL buffer */ },
}));

Letting the host call you: `@export` + the app locus

To run a game loop or react to network messages, the host needs to call into Hale. The browser-client shape is an @export locus — the persistent “app” of your program:

@export locus Client {
    params { sx: Float = 0.0; sy: Float = 0.0; ready: Bool = false; }
    birth() { }
    fn on_message() { /* parse an inbound frame, update fields */ }
    fn frame()      { /* render from the fields */ }
}

Each fn method becomes a wasm export the page calls by name (inst.exports.frame()). State lives in the locus’s fields and persists across calls — on_message() writes self.sx, frame() reads it, just like a native locus. On the native target @export is a no-op. (There is also a lower-level @export fn for free functions — same export, but stateless; see below.) Methods may not be fallible (the host has no error channel), and the locus must not define run() — the host drives it.

The run-model: entry inversion

A native program blocks in main. A browser program can’t — it must return to the event loop so the page stays responsive. So a program built with @export runs inverted: there is no main, and the host drives the exports (typically frame() once per requestAnimationFrame).

The compiler synthesizes an exported _hale_start() that sets up a persistent program arena and instantiates your @export locus (running birth). The loader calls it once at startup; after that the page drives the methods:

const inst = await run(glue);     // _hale_start ran here (Client is alive)
function tick() {
  inst.exports.frame();
  requestAnimationFrame(tick);
}
requestAnimationFrame(tick);

A program made of @export declarations needs no fn main at all.

Quick wasm from a bare `fn main`: `--wrap-main`

A wasm program needs an @export entry — but a script, a tutorial snippet, or anything pasted into the browser playground is just a fn main. The --wrap-main build flag bridges that gap:

hale build snippet.hl --target wasm32 --wrap-main

When the program has a top-level fn main() and no @export entry, --wrap-main synthesizes — on the parsed AST, before typecheck — the equivalent of:

target wasm { }
@export locus __Main { birth() { <main's body> } }

so main’s body runs once at _hale_start, exactly as it would run once natively. Because it works on the AST, not the source text:

diagnostics keep the user’s line/col — the synthesized locus borrows main’s spans and the body is moved intact, so a type error on the user’s line 3 is reported on line 3 (a textual wrap would shift every following line);
it’s string/comment-safe — the real lexer found the body, so a { or } inside a string literal or comment can’t mis-wrap it;
the target wasm gate is injected too, so the syscall-backed stdlib (std::io::tcp, std::process, …) is rejected with a precise diagnostic, on untouched source.

It is wasm-only and opt-in: it requires --target wasm32 (there is no native entry-inversion to wrap, so it errors on a native build), and it’s never implied — a normal wasm program may legitimately keep a bare fn main exported as main. If the program already declares an @export entry, --wrap-main leaves it untouched (prefer-explicit). This is the one flag the browser playground passes so it can hand the compiler raw user source and surface errors on the exact line.

Inbound messages

The page hands network bytes to Hale through the inbox: write them into wasm memory, publish the length, then call a method.

// JS: hand a WebSocket frame to Hale, then notify it
const bytes = new TextEncoder().encode(ev.data);
const ptr = inst.exports.lotus_wasm_alloc(bytes.length);
new Uint8Array(inst.exports.memory.buffer).set(bytes, ptr);
inst.exports.lotus_wasm_set_inbox(bytes.length);
inst.exports.on_message();

@ffi("c") fn lotus_wasm_inbox() -> Bytes;   // the bytes JS wrote

// inside the Client locus:
fn on_message() {
    let msg = lotus_wasm_inbox();
    if len(msg) > 0 {
        let s = std::str::from_bytes(msg);
        // ... std::json parse, then store into self.* ...
        self.ready = true;
    }
}

This is the full pattern for a browser client: the page owns the transport (fetch / WebSocket) and the GL context; the @export locus parses the protocol with std::json, holds the game state in its fields, runs the camera, and emits geometry — the same code shape it would have natively.

Lower-level: `@export fn` + the state cell

If you don’t want a locus, you can export free functions (@export fn frame()). These are stateless — each call’s allocations are released on return — so cross-call state must be parked in the runtime state cell, packed into Bytes:

@ffi("c") fn lotus_wasm_state_set(b: Bytes);
@ffi("c") fn lotus_wasm_state_get() -> Bytes;

The @export locus model is preferred for anything with state; the state cell exists for the free-fn path and for hand-rolled layouts.

See spec/ffi.md § WASM host interface for the exact marshalling and diagnostic rules.

Cross-process & hot-load

Coming from Rust / C++? This is typed, versioned state shipped between processes — but without a separate .proto and a codegen step. A perspective is a serializable parameter bundle; producer and consumer share its schema because they compile from the same source. No protobuf regen, no schema drift, no handshake.

A perspective is a shippable view

Most of a locus’s state is private to its region. A perspective is the exception: a typed bundle a locus can publish across a process boundary, with a compile-time guarantee that the other side agrees on its shape.

perspective Kernel {
    params {
        scale_row:    [Decimal; 8];
        sigma_factor: Decimal;
        regime_id:    Int;
    }
    stable_when {
        return self.num_validated >= 3;
    }
    serialize_as KernelV1;
}

params is the payload — the schema is this type.
stable_when is a predicate the runtime checks before the perspective is allowed to ship — “is this data ready?” lives in the data’s own declaration, not in a publisher flag.
serialize_as names the wire format stably, so you can rename the identifier without breaking serialization.

A perspective is not a locus — no lifecycle, no bus block, no methods beyond stable_when. It’s a validated, serializable bundle the substrate knows how to ship.

The fitter / applier pattern

The canonical use: one process computes parameters slowly and carefully; another applies them at high frequency. Both compile from the same Kernel declaration, so the type is the protocol.

// fitter — publishes refined Kernels
locus Fitter {
    bus { publish KernelUpdates; }
    run() {
        let mut k = compute_kernel(observations);
        while !k.is_stable() { k = refine_kernel(k, more()); }
        KernelUpdates <- k;
    }
}

// applier — swaps in the latest, atomically
locus Applier {
    params { current: Kernel = default_kernel(); }
    bus { subscribe KernelUpdates as on_update; }
    fn on_update(k: Kernel) { self.current = k; }   // atomic swap; no torn read
}

The runtime guarantees the consumer-side swap is atomic — readers see the old perspective or the new one, never a half-written mix. This is also the hot-load mechanism: reconfigure a long-running service by publishing a new perspective, with full type-checking against the locally-compiled schema, no restart.

Capability profiles and substrates

The same locus + bus + perspective triple runs on more than one substrate. The native C-runtime is one; the browser runtime (hale-js) is another. A build target declares the capabilities a substrate offers:

target browser_js {
    arenas.epoch_view,
    time.monotonic, time.wallclock,
    random.csprng,
    gfx.canvas2d,
}

A program that reaches for a capability its target doesn’t offer fails at the translation boundary with a clear CAP-MISSING diagnostic — at build, not at runtime. Substrate differences are named and checked, not papered over. The locus you wrote doesn’t change between substrates; the capability profile and the transport binding do.

This is the long arc of the whole guide paying off: the same shape you met as a small program in the basics runs across processes, machines, and substrates because nothing in the shape depended on where it ran.

Next, the most specialized tier feature — Modes.

Operations & debugging

Most of the time an Hale program either works or fails loudly. The two exceptions — the ones that send you here — are a message that quietly doesn’t arrive and resident memory that quietly grows. Both are silent by design (the steady-state behavior is correct), so the runtime ships opt-in diagnostics you switch on with an environment variable or a build flag. This chapter is the operator’s map: what each knob shows, and two worked triage walkthroughs.

Nothing here changes behavior — every switch is observe-only. The canonical reference for each variable is spec/runtime.md; this is the pedagogical version.

Bus: “my publish isn’t arriving”

A publish that compiles is not a publish that’s delivered — the subject might match no subscriber, the payload might fail to deserialize, or the subscriber might be on a pool that never runs. The bus drops these silently because for an on_unmatched: swallow topic in steady state that is the right behavior. To see the drops, set one variable:

LOTUS_BUS_LOG_DROP=1 ./myapp

LOTUS_BUS_LOG_DROP is the broad net — reach for it first. It prints one stderr line at every silent-drop site, naming the call site, subject, and size/index info: no-matching-subscriber, serialize-returned-≤0, deserialize-returned-≤0, and the “matched-but-no-post-target” case (mailbox / pool / queue all null). It implies the two narrower variables, which you can use on their own once you know which class you’re chasing:

Variable	Surfaces
`LOTUS_BUS_LOG_DROP=1`	everything below, plus serialize-fail and no-post-target
`LOTUS_BUS_LOG_UNMATCHED=1`	a keyed publish (`where key == …`) that matched no subscriber — prints subject, key, and the per-topic subscriber counts
`LOTUS_BUS_LOG_DESERIALIZE_DROP=1`	the `udp://` reader thread dropping a frame (no deserializer registered, or a size-mismatched read)

The shape that produces no line at all. If LOTUS_BUS_LOG_DROP is silent but the handler still never fires, the message was delivered to the queue and the problem is downstream: the subscriber’s pool isn’t draining. The classic cause is a run() on a cooperative pool that blocks (a long time::sleep, a blocking syscall) and starves the handler — hale check warns on blocking syscalls in a cooperative run(), and std::process::dump_pool_residency() shows pending counts per pool so you can see work piling up unserved.

Memory: “my RSS is growing”

Hale frees a locus’s whole region on dissolve, so a leak is usually one of two things: an allocation that escapes to a long-lived arena (it never dissolves), or a queue/buffer whose high-water mark keeps climbing. Two layers of instrumentation pin it down — one at runtime, one at compile time.

Runtime residency. Set LOTUS_ARENA_RESIDENCY=1 to register every top-level arena (each locus’s region, the global, the bus payload arena) with a construction backtrace. Then call std::process::dump_arena_residency() to emit one line per live arena — bytes, chunks, parent, label — sorted by bytes descending, each with the backtrace of where it was created:

// In a long-running daemon, sample from a heartbeat tick so locus
// arenas are caught *while alive* — the atexit dump fires only
// after every locus has torn down.
fn on_tick() {
    std::process::dump_arena_residency();   // → stderr, needs LOTUS_ARENA_RESIDENCY=1
    println("rss=", std::process::rss_bytes() / 1048576, " MB");
}

std::process::rss_bytes() is the cheap top-line number — poll it to confirm growth before you go digging. dump_pool_residency() is the per-pool view (pending/in-flight work), useful when the growth is a queue rather than an arena.

Compile-time proofs. Before the program even runs, three build flags report on allocation shape:

Flag	Reports
(default on every check/build)	flag an allocation that escapes into an unbounded context and accumulates until its locus dissolves (advisory warnings; `--no-warn-unbounded-alloc` opts out)
`--dump-alloc-summary`	every allocation site, escape-tagged (local / returned / stored-to-self / sent), with the bounded-vs-unbounded verdict; plus each locus’s storage shape (capacity slots, `@form`, projection cap) and the `self.<field>` / `self.<slot>` an allocation targets
`--dump-resource-budget`	per-locus resource counts (allocations, held fds) against declared ceilings
`--locality-report`	per-locus working-set size against cache-tier budgets

The memory-bound warnings run by default on every hale check and hale build (since 2026-07-02 — the flip followed a full-corpus audit of all 402 warnings). Run-to-exit programs are exempt automatically: a binary whose main starts no run loop and subscribes no handler owes no memory-bound proof, so scripts and one-shot tools stay silent.

For a long-lived service, the surface is:

@unbounded fn — the greppable in-source carve-out for an acknowledged accumulation (an operator-sized cache, an idempotency log). Silences that body’s sites. Also valid on a lifecycle hook (@unbounded run { … }).
```
locus Aggregator {
    // ... handlers checked for unbounded accumulation ...

    @unbounded fn on_snapshot(s: Snapshot) {
        // acknowledged: this cache is operator-sized on purpose.
    }
}
```
--no-warn-unbounded-alloc — opts a whole run out.
@bounded locus L { … } is now redundant with the default and still accepted.

The warnings are advisory — they print but don’t fail the build. A warning here is the compile-time complement to the residency dump: it tells you which site can grow before you’ve watched it grow.

Bus backpressure: bounding a flood

A producer that outruns its consumer used to grow the dispatch queue without limit. It no longer does — the queue and each pinned-locus mailbox are capped at LOTUS_BUS_QUEUE_CAP cells (default 8192 ≈ 4.5 MB):

LOTUS_BUS_QUEUE_CAP=1024 ./myapp   # tighter bound, more frequent drains

Past the cap the producer back-pressures rather than buffering: a single-threaded cooperative producer inline-drains the queue (runs the oldest handlers) to make space; a cross-thread producer to a pinned mailbox blocks on a condvar until the consumer drains a slot. Every message is still delivered — only the timing and memory profile change. Lower the cap to tighten the memory bound; raise it to reduce drain bursts. (See GH #125 for the full mechanism.)

Shelling out to other programs

Ops glue often means running another tool. std::process::run does a synchronous fork + exec + wait and captures the result. The argument vector is newline-separated (no shell, no word splitting — each line is one argv entry):

let out = std::process::run("git\nstatus\n--short") or raise;
println("exit ", to_string(out.code));
println(out.stdout);
if len(out.stderr) > 0 { println("stderr: ", out.stderr); }

The returned ProcessOutput carries code: Int (the exit code, or -1 if killed by a signal), signal: Int (the killing signal, 0 if it exited normally), and stdout / stderr as captured Strings. run is fallible(IoError) — a missing binary or a fork failure raises rather than returning a bogus output.

For a long-running child you drive incrementally, the lower-level spawn / wait / kill / write_stdin / read_stdout / read_stderr surface over a Child handle is in spec/stdlib.md.

Other process self-introspection: std::process::pid(), std::process::exit(code), and std::process::rss_bytes() (peak RSS — see Memory above).

Worked triage

“My subscriber’s handler never runs.”

LOTUS_BUS_LOG_DROP=1 ./app. A line at the publish? → the subject or key doesn’t match, or the payload won’t deserialize. Fix the subject/key or the payload type.
No line, but still no delivery? → the message reached the queue; the consumer isn’t draining. Check the subscriber’s pool: a cooperative run() that blocks starves handlers. hale check flags blocking syscalls; dump_pool_residency() shows the pending pileup.
Subscriber is an inline child or on where async_io? → confirm it’s instantiated as an owned param or top-level, not unowned in a method body (which dissolves at scope exit before it can fire — hale check errors on this).

“My RSS climbs over hours.”

rss_bytes() from a heartbeat — confirm it’s monotonic, not sawtooth (sawtooth is healthy churn).
LOTUS_ARENA_RESIDENCY=1 + dump_arena_residency() from the same heartbeat — find the arena whose bytes grows. The label and backtrace name the locus and birth site.
A root-kind arena growing is the leak; a sub arena recycles. If it’s the bus payload arena, the high-water is queue depth — lower LOTUS_BUS_QUEUE_CAP. If it’s a locus arena, you’re accumulating into a field: prefer in-place mutation (self.f.x = v) over whole-value replace (self.f = T{…}), which bump-allocates fresh each time. --dump-alloc-summary names the site at compile time.

Debugging with the native toolchain

Hale binaries carry DWARF line tables by default (zero runtime cost). That means real debugging:

hale build myservice
gdb ./myservice
(gdb) break myservice.hl:42
(gdb) run
(gdb) backtrace          # real .hl file:line frames, inline stacks

addr2line -e ./myservice 0x4a2f10 resolves crash-dump addresses to source lines, and ASAN reports carry file:line through both the Hale code and the runtime. Profile with perf record --call-graph dwarf (frame pointers are deliberately not forced — they cost ~22% on runtime fast paths). Opt out of debug info with LOTUS_NO_DEBUGINFO=1.

Modes

Coming from Rust / C++? Think of modes as asking the compiler to emit a different execution strategy for the same computation over the same state — vectorized throughput, cache-tiled per-class work, or a single scalar decision — without you maintaining three copies. It’s the most specialized feature in the language; most loci never declare one.

Three named projections of one kernel

A locus can declare up to three modes, each a named projection of the same underlying computation, operating on the same locus state:

locus Pricer {
    params { /* shared state */ }

    mode bulk(...)       -> ... { /* vectorized over many inputs */ }
    mode harmonic(...)   -> ... { /* per-class / cache-tiled */ }
    mode resolution(...) -> ... { /* one decision, scalar */ }
}

You invoke a mode like a method — self.bulk(...), self.resolution(...) — and declare only the subset you actually operate in. They map to genuinely different hardware execution regimes:

bulk — vectorized throughput: the same operation across many elements at once.
harmonic — cache-tiled, per-class projection: work organized so each class’s data stays resident.
resolution — a single scalar decision: the one-input-one-answer path.

The compiler emits a strategy tuned to each regime, rather than running one general implementation everywhere.

All three modes read and write the same locus state through the same arena — there’s no duplicate allocation and no copy between them. Because they can touch the same fields, the compiler verifies the modes don’t write-conflict: a resolution-mode write to state that bulk mode also writes during overlapping evaluation is a compile-time error. You get three execution strategies over one piece of state, with the aliasing hazard checked for you.

Why three, and no fourth

The count isn’t arbitrary minimalism — it’s that vectorized, cache-tiled, and scalar are three distinct cost regimes on real hardware (high-throughput SIMD, locality-bound per-class, and latency-bound single-decision). There’s no fourth regime the hardware rewards, so there’s no fourth mode. The same commit-hard discipline as the three projection classes for memory.

When you’ll reach for this

Rarely, and only at this tier — when a locus has a kernel computation that genuinely runs in more than one of those regimes (a numeric model evaluated both in batch and per-decision, say) and you want each path lowered well from one declaration. For ordinary application and service code, you’ll never declare a mode; the lifecycle methods and fn members cover everything.

That’s the systems tier — and the bottom of the descent. You started with variables and functions; you’ve now seen the memory model, the allocation disciplines, zero-copy transport, the C boundary, cross-process state, and hardware execution regimes. Every one of them is the same locus you met in the basics, observed at greater and greater resolution.

To see why one shape holds across all four tiers — and across human, LLM, and machine — read The design. For exact rules, the reference points into the canonical spec.

Reference

This guide is the tour. The canonical contract — what the compiler actually enforces — lives in the spec/ directory at the repository root. When the guide and the spec disagree, the spec wins; when you need the exact rule, an edge case, or a diagnostic’s meaning, go there.

The spec, by topic

You want	Read
The formal grammar	`spec/grammar.ebnf`
Lexical structure, literals, operators	`spec/tokens.md`
Operator precedence & associativity	`spec/precedence.md`
Operational semantics (lifecycle, bus, recovery, fallible)	`spec/semantics.md`
The type system	`spec/types.md`
Memory: regions, capacity slots, projection classes	`spec/memory.md`
The form library (`vec` / `hashmap` / `ring_buffer`)	`spec/forms.md`
The always-loaded runtime	`spec/runtime.md`
The standard library surface	`spec/stdlib.md`
Idiomatic patterns & the six shapes	`spec/styleguide.md`
The FFI contract — C (`@ffi("c")`) and the WASM host interface (`@ffi("js")` / `@export`)	`spec/ffi.md`
Dependencies & vendoring	`spec/packages.md`
Project layout & imports	`spec/projects.md`
How tests are written and run	`spec/testing.md`
Why every design choice was made	`spec/design-rationale.md`

Two more anchors

AGENTS.md — the load-bearing prompt for agents writing .hl. It condenses the six idiomatic patterns, the “what’s not in the language” reflexes, and the formal design model into one file. Excellent for a human, too.
Working programs — crates/hale-codegen/tests/fixtures/examples/ holds ~70 small per-feature programs, numbered. Reading a few near your target shape is the fastest way to see real, compiling Hale.

Toolchain commands

Command	Does
`hale run <file/dir>`	compile + run (fast feedback)
`hale build <file/dir>`	compile to a native binary
`hale check`	parse + typecheck only
`hale test`	run `*_test.hl`
`hale fetch`	clone & pin git dependencies
`hale fmt`	canonical formatter

Libraries (pond)

The standard library covers the substrate — I/O, time, strings, JSON, HTTP, crypto, the bus. Everything else — web stacks, databases, observability — lives in pond, the contributed library catalog: https://github.com/hale-lang/pond.

Many lotus grow in a pond. Each library is a directory of .hl loci you vendor into your project.

Using one

Declare it in hale.toml, fetch it, import it:

[deps]
pond = { git = "https://github.com/hale-lang/pond", tag = "v0.1.0" }

hale fetch

import "vendor/pond/router" as router;

hale fetch clones each dependency into vendor/<name>/ and pins the resolved commit in hale.lock. Pond’s “no transitive dependencies in v1” rule means every package your program pulls in is visible in your lockfile — if a library uses another, you vendor both explicitly.

The catalog

Persistence & data

Library	Provides
`db`	Driver-agnostic database surface: the `DbDriver` interface + `Args` bind-parameter list for parameterized (`$1, $2, …`) queries. Pick a backend (`pq`, `sqlite`) at the `DbDriver` slot.
`pq`	PostgreSQL driver — `PgConn` plus `PgPool`, a fixed-size fd connection pool that itself satisfies `db::DbDriver`.
`sqlite`	SQLite connection + fallible query surface.
`migrations`	Schema migration runner (up/down); builds to a `migrate` binary.
`jobs`	SQLite-backed job queue (`Queue`) + a pinned-worker pool.

Web

Library	Provides
`http`	HTTP client (`http/client`) over `std::io` — request/response building atop the socket primitives, for libraries that need an HTTP client without the full `std::http` server surface.
`router`	HTTP router over `std::http` — method + path-param routes, middleware chain.
`sessions`	Stateless, HMAC-signed cookie sessions (`session=<base64(payload)>.<base64(hmac)>`).
`websocket`	Synchronous, owner-driven RFC 6455 WebSocket client (suggested alias `ws`); a passive wrapper your own `run()` loop drives.

Observability & supervision

Library	Provides
`logfmt`	Alternative `std::log` sinks wearing the `std::text::Sink` shape — file with rotation, structured output.
`metrics`	Counter / gauge / histogram primitives + a Prometheus text-format renderer and `/metrics` endpoint.
`tracing`	Span tree mirroring the locus tower — one `Tracer` per app; spans nest with locus instantiation.
`supervisor`	Erlang/OTP supervision-tree strategies grafted onto Hale’s `on_failure` + `restart` / `restart_in_place` / `bubble`.

Primitives & composition

Library	Provides
`crypto`	SHA-256, HMAC-SHA256, hex encode/decode, constant-time compare, CSPRNG.
`subprocess`	Spawn + manage child processes (suggested alias `sub`) — wraps the `std::process` spawn / wait / pipe primitives.
`tower`	Run several independent locus trees (“towers”) under one process, each with its own root and lifecycle.

Terminal & UI

Library	Provides
`term`	Tier-0 terminal infrastructure — capability/`is_tty` probes, SGR styling, raw-mode guard, cursor + screen control over `std::term`.
`tui`	An Elm-shaped TUI runtime: write a locus with model/update/view, the runtime drives the frame loop, input, and rendering.

AI & numeric

Library	Provides
`agent`	LLM-agent toolkit — `agent/{llm, tools, conversation, embeddings, sandbox}`: a client surface, a tool-registry, conversation state, and a sandboxed execution path.
`ml`	Neural-network primitives (`ml/neural`).
`math`	Numeric helpers — `math/{matrix, stats}`.

heron (the tree-sitter grammar that drives editor tooling) also lives in pond, but it’s developer tooling, not a vendored runtime library you import. The _util directory holds internal helper libs consumed by other pond libs, not imported directly by apps.

Pond is where the ecosystem grows: if a protocol, parser, or shape is too useful to rewrite per project but doesn’t belong in the language, it lands here.

Verification

Most languages ask you to write correct concurrent code and hope you did. Hale takes a different bet: make incorrect designs fail to compile, and model-check the runtime everything executes on. This page is the honest account of what that buys you — and what it deliberately doesn’t.

The substrate is model-checked

Hale’s runtime, lotus, is C: pthreads and C11 atomics. Every primitive in it with a cross-thread surface is transcribed into a model and checked exhaustively, under every legal interleaving, with GenMC — as a standing CI gate. A race, use-after-free, or assertion failure in any model fails the build.

Primitive	What’s verified
Lock-free hashmap	the enter / drain / grow protocol
Mailbox monitor	the pinned-locus mutex hand-off
Bus queue	the cooperative-pool conditional lock
Arena subregion lock	the parent’s child-slot freelist

Each model carries a negative control: delete the synchronization and GenMC reports the exact bug the real code prevents — proof the check has teeth. (The per-thread chunk pool needs no model: it is __thread, with no cross-thread surface.) Sanitizers catch races on the paths your tests happen to hit; model checking catches the ones no test reliably triggers — grow-during-drain, compact-then-grow. For a language whose whole concurrency story is the bus, trusting the substrate is the foundation everything else rests on.

Your programs are data-race-free by design

Above the substrate, the language is shaped so application code can’t introduce a data race in the first place:

A typed bus instead of shared state. Loci talk by publishing typed values to topics; the payload is copied into the receiver’s region. There is no shared mutable cell to race on.
The single-threaded-method invariant. Calling a locus’s method from the wrong pool’s thread is a compile error.
Vertical-only failure. No lateral references between siblings; a failure travels up to a parent’s on_failure, never sideways.

Checked at build time

These run during hale check / hale build, on top of ordinary type-checking.

Bus-graph properties. The bus topology is a typed graph, and the compiler walks it. This is the analysis that is on by default and fails the build:

orphan topics (wired to only one end) — warning
cross-locus cycles that can spin — warning
intra-locus re-entrant self-publish (unbounded recursion) — error
backpressure — an unthrottled publish in an unbounded loop — warning
subject type-mismatch — two sites disagreeing on a payload type — error

Design rules, enforced as errors:

No locus-return — a method may not hand back a managed locus (a Law-of-Demeter / CQRS / dependency-inversion violation caught in one rule).
Codec purity — a bus codec’s encode / decode must be pure; they may run off-thread.
ring_layout conformance — a foreign shared-memory ring layout is checked for internal and cross-field consistency before a torn read is possible.

Concurrency & placement, keeping a program’s placement coherent with how the runtime dispatches:

Dead bus receiver — a cooperative locus that subscribes to the bus and blocks in run(), so the blocking call monopolizes the pool thread and its handlers never fire — error.
Blocking call on a cooperative pool — a blocking run() (recv / accept / process::run) on a pool that isn’t where async_io; it holds the pool’s thread and stalls co-scheduled loci — warning.
Nested long-running child — a non-main locus holding a params field of a locus type whose run() never returns; the fix is hoisting it to a main sibling with its own placement — error.
Unowned subscriber locus — a bus-subscribing locus instantiated non-owned in another locus’s method body, so it dissolves at scope exit before its subscription can fire — error.

Memory-bound proofs (on by default). Every hale check / hale build runs the whole-program survey: the compiler’s escape/loop dataflow flags allocations that escape a per-message handler or unbounded loop and accumulate until the locus dissolves — with loop-ranking that proves a while v < N counter bounded. Run-to-exit programs (a main with no run loop and no bus handler) warn nothing — a script owes no bound proof. @unbounded fn is the in-source carve-out for an acknowledged site; --no-warn-unbounded-alloc opts a run out. Advisory today; a hard error contract is the intended end state once the remaining documented false-positive classes get their annotations.

Resource budgets (opt-in). Static counts of file descriptors, OS threads, cooperative pools, and bus subjects, with a --check-resource-budget budget.toml ceiling gate for CI and fd-leak detection.

What Hale does not claim

Hale is not a whole-program functional-correctness prover — that is the world of CakeML and F*. The guarantee here is narrower and deliberately so: the coordination (the bus graph), the substrate (the concurrent primitives), and bounded resource use are verified, because those are the properties that must hold no matter what executes the design — native, wasm, or a future target. Verification that survives a change of substrate is the kind worth building on.

The authoritative, exhaustive catalog of every compile-time check is spec/verification.md. The verification roadmap that drove this work — now delivered — is GitHub issue #18.

The design

Why one shape held across all four tiers.

This guide descended four levels — a small scripting language, a high-level application language, a concurrent-services language, a systems language. At each level you reached for the same primitive, the locus, and saw more of it. That wasn’t a teaching trick layered on top of the language. It’s the language’s actual structure, and it’s worth seeing whole now that you’ve felt it.

It was towering loci all along

Hale is built bottom-up from one idea: a locus is a system — a thing that decomposes into sub-systems and serves a role in some super-system. Everything structural is a locus. A type is a locus that hasn’t grown flow yet; an app is a locus; a service, a connection, a collection, a parser — loci, all the way down.

The tiers of this guide are the same tower observed at different depths:

The basics met a locus as the shell around main.
Everyday programs saw it as an object with state and methods.
Concurrent services saw it as a lifecycle, a bus participant, a supervised parent.
Systems control saw it as a memory region with a layout and an execution strategy.

None of those views contradict; each is a higher-resolution perspective on the thing below. That’s why the function you wrote in chapter one still works in the last chapter — you were descending into one structure, not switching languages.

The commitments that make it hold

A locus carries a small set of structural commitments, and every guarantee in the language falls out of them:

Bounded attachment. A locus bounds how many things attach to it. (The capacity model you met in the systems tier.)
Vertical-only flow. A locus talks up to its parent and down to its children — never sideways. Siblings coordinate through a shared parent or the bus.
Failure flows up. A broken invariant routes to the parent’s policy, recursively, to the root.
The root is the horizon. Recursion stops at the current observable boundary — the program’s root, a process edge, a substrate.

From vertical-only flow you get memory safety with no GC and no borrow checker: no pointer crosses sideways, so a region frees wholesale at dissolve. From failure-flows-up you get supervised, let-it-crash recovery with typed policy. From bounded attachment you get the cost model the runtime can plan against. The constraints aren’t restrictions bolted on — they’re the source of the guarantees.

Why one shape spans human, LLM, and machine

There’s a structural reason the matchmaker from the introduction decomposes the same way on paper, in Hale, and inside an LLM’s plan. When K things attach to one coordination point, the working state to hold them together costs about K log₂ K bits. That ceiling — roughly 4 to 10 — shows up everywhere coordination happens: human working memory, spans of control, mixture-of-experts active counts, multi-agent LLM saturation. The same bound, substrate-invariant.

A Hale program is the literal shape of that bound: loci are vertices, topics are hyperedges, capacity declarations bound each vertex’s K. So translation across the human → LLM → machine boundary stays cheap — each layer uses the same vertices and edges, and no representation has to be rebuilt in a foreign idiom. It’s the same reason the locus survives the move from the native runtime to the browser to any future substrate: substrate variance doesn’t reach into the shape.

Going deeper

AGENTS.md — the formal model in one page: nodes, hyperedges, and invariants, with the locus ↔ Σ mapping. Written for agents authoring .hl, but it’s the tightest statement of the design for a human too.
spec/design-rationale.md — every numbered design decision (F.1 … F.36), the alternatives considered, and why each commitment is shaped the way it is.
hale-lang/papers — the structural mathematics and the cross-substrate evidence for the k̄ ∈ [4, 10] bound.

You now have the whole arc: a small language at the top, a systems substrate at the bottom, one shape connecting them. Build something — and if the decomposition into loci feels natural, that fit is the thesis working.

Keyboard shortcuts

Hale