Introduction
One language. Four altitudes.
Most languages pick a level and live there. Python and JavaScript sit high — fast to write, far from the metal. Go sits in the middle — concurrency in the language, a runtime underneath. Rust and C++ sit low — you own memory and layout, and you pay attention to both.
Hale is a single language you can write at any of those levels, and move between them without switching tools. The same file can read like a script at the top and like a systems program at the bottom. There is one primitive — the locus — and the only thing that changes as you descend is how much of it you choose to see.
Try it now: the playground runs real Hale, compiled to WebAssembly, right in your browser — no install.
This guide is built around that idea. It introduces Hale at four levels, each one self-contained:
- The basics — variables, math, functions, control flow. Hale as a small, clean language. You can write real scripts knowing only this.
- Everyday programs — files, JSON, HTTP, a bit of structure. Hale at the altitude you’d reach for Python or Node.
- Concurrent services — long-running processes, a typed message bus, supervision. Hale where you’d reach for Go.
- Systems control — memory, layout, lifetime, zero-copy I/O, C bindings. Hale where you’d reach for Rust or C++.
Each level expands on the one before it without contradicting it. The function you wrote in the basics still works in systems control — you’ve just learned to see more of what was always there.
A taste
Here’s a small service. Don’t worry about every keyword yet; notice that each phrase you’d say out loud has a place to live.
type Player { id: String; name: String; }
type MatchInfo { match_id: String; players: [Player]; }
topic JoinQueue { payload: Player; }
topic MatchReady { payload: MatchInfo; }
locus Matchmaker {
params { target_size: Int = 4; }
bus {
subscribe JoinQueue as on_join;
publish MatchReady;
}
fn on_join(p: Player) {
self.waiting.push(p);
if self.waiting.len() >= self.target_size {
MatchReady <- assemble_match(self.waiting, self.target_size);
}
}
}
“A matchmaker” → locus Matchmaker. “That receives players”
→ subscribe JoinQueue. “And announces matches” → publish MatchReady. “When enough are queued” → the if. The code
keeps the shape of the sentence.
That’s the bet behind Hale: the gap between how you describe a system and what you type doesn’t have to be there. The design chapter explains why one shape works across the whole range — and across human, LLM, and machine.
How to read this
If you’re new to programming or to systems languages, start at The basics and go in order. If you already program, skim the basics for the parts that differ from what you know (the failure model and the money/time types are worth a look), then jump to the level that matches the program you want to write. Every level after the basics opens with a short “Coming from X?” box to orient you.
When you want the exact rules rather than the tour, the
reference points into spec/ — the canonical
contract the compiler enforces.
Head to Install to set up the toolchain, then Your first run to put a program on screen.
Install
Get the
haletoolchain on your path.
There are two ways to get hale: download a prebuilt binary
(quickest), or build from source (for contributors, or a
platform without a prebuilt). Either way, read What you need to
run programs — hale is a
compiler that shells out to a C toolchain, so it has a couple of
runtime requirements no matter how you install it.
Quickest: prebuilt binary
Grab the tarball for your platform from the releases page:
| Platform | Asset |
|---|---|
| Linux x86_64 (glibc) | hale-<version>-x86_64-unknown-linux-gnu.tar.gz |
| macOS Apple Silicon | hale-<version>-aarch64-apple-darwin.tar.gz |
tar -xzf hale-<version>-<triple>.tar.gz
# The archive contains `hale` AND `libhale_ts_shim.a` — keep them
# in the SAME directory: the compiler looks for the shim next to
# its own binary and can't link programs without it.
sudo cp hale libhale_ts_shim.a /usr/local/bin/ # or anywhere on PATH, together
hale --help
The binary is self-contained with respect to LLVM — LLVM 18 is statically linked in, so you do not need to install LLVM to run the compiler. (Intel Macs: run the Apple-Silicon build under Rosetta 2.)
What you need to run programs
Regardless of how you installed hale, compiling a program
(hale run / hale build) recompiles and links the runtime on
your machine, so you need a C toolchain present:
clangon yourPATH(bare orclang-18) — used to assemble and link the emitted native code.lldis additionally needed only if you build withLOTUS_LTO=1or targetwasm32.- OpenSSL shared libraries (
libssl/libcrypto) — the standard library’s TLS client links against them unconditionally.
Installing clang pulls in libLLVM as clang’s own dependency —
that’s expected and harmless; hale itself doesn’t need it.
Build from source
Requirements:
- Rust 1.95 or newer (the compiler is written in Rust).
- LLVM 18 development libraries, with
llvm-config-18on yourPATH(orLLVM_SYS_180_PREFIXpointing at the install). LLVM 17, 19, and 20 will not link — the backend is pinned to 18. - clang (+ lld for LTO / wasm), OpenSSL headers, and git.
Debian / Ubuntu (LLVM 18 is in stock apt on 24.04+):
sudo apt install llvm-18-dev libpolly-18-dev libzstd-dev \
clang-18 libclang-18-dev lld-18 zlib1g-dev \
libssl-dev pkg-config git
Fedora
sudo dnf install llvm18-devel clang18 lld openssl-devel git
macOS (Homebrew)
brew install llvm@18 openssl git
export LLVM_SYS_180_PREFIX="$(brew --prefix llvm@18)"
Then:
git clone https://github.com/hale-lang/hale
cd hale
cargo build --release
The hale binary lands at target/release/hale (and
libhale_ts_shim.a beside it). Put the binary on your path, or
invoke it through Cargo as shown below.
Reproducible / release build
release/docker-compose.yml builds a self-contained Linux tarball
in a pinned ubuntu:24.04 + LLVM 18 container, so you don’t have
to match the toolchain locally:
docker compose -f release/docker-compose.yml run --rm build
# -> dist/hale-x86_64-unknown-linux-gnu.tar.gz
Platform support
| Platform | Status |
|---|---|
| Linux x86_64 (glibc) | First-class — hosts the compiler and runs compiled programs, all features. |
| macOS (Apple Silicon) | Supported — hosts the compiler and targets itself. Everything runs except async_io pools, which fail at compile time with a clear diagnostic (use a cooperative pool, or build on Linux). Intel Macs run the arm64 build via Rosetta 2. |
| Windows | No native support (the runtime is POSIX). Use WSL2 (Ubuntu) and follow the Linux instructions. |
| wasm32 | hale build --target wasm32 for the browser. |
Verify
hale --help
Or through Cargo from a source checkout:
cargo run -p hale-cli --bin hale -- --help
To run the compiler’s own test suite (single-threaded avoids “text file busy” flakes from parallel test binaries racing on the same temp path):
cargo test --release --workspace -- --test-threads=1
The two ways to run a program
Both go through the same LLVM-native compiler — there’s no separate interpreter, so they never disagree:
hale run prog.hl— compiles and runs in one step (the binary is temporary). The fast inner loop while you write.hale build prog.hl— compiles to a native binary on disk via LLVM. This is the artifact you ship.
hale run prog.hl # compile + run
hale build prog.hl # compile to ./prog
./prog
Throughout this guide we write hale run / hale build as if
hale is on your path. From a source checkout without it
installed, prefix with cargo run -p hale-cli --bin hale --.
Next: Your first run.
Your first run
Put something on screen.
Create a file hello.hl:
fn main() {
println("Hello from Hale.");
}
Run it:
hale run hello.hl
Hello from Hale.
hale run compiles your program and runs it in one step — it’s
the same native code hale build produces, just executed
immediately and not left on disk. When you want the artifact to
keep and ship, build it:
hale build hello.hl
./hello
Same compiler, same output: run is the fast inner-loop shape,
build is for the binary you deploy. There’s no separate
interpreter, so anything that runs under build runs identically
under run.
What’s here
-
fn main()is the entry point, the same as it is in C, Go, or Rust. A Hale program starts by calling it. -
println(...)prints its arguments followed by a newline. It takes any number of arguments and concatenates them — there’s no format string:fn main() { let name = "Hale"; println("Hello from ", name, "."); } -
Statements end with
;. Newlines are just whitespace — they don’t end statements. Source is ASCII outside of string literals and comments.
Comments are C-style:
// a line comment
/* a block comment */
That’s the whole surface you need to start. The next chapter introduces variables and the value types — the vocabulary every Hale program is built from.
hale runand imports. A single file’simport "..." as ...;directives are resolved byhale runjust ashale buildresolves them. The one gap is the ad-hoc directory form (hale run ./dir), which bundles the directory’s files without cross-seed import resolution — usehale build ./dirfor a multi-file project that imports libraries.
Build modes, diagnostics, and debugging
A few switches worth knowing from day one:
- Faster iteration:
hale build --dev(orHALE_DEV=1) uses a lighter optimization pipeline — noticeably quicker builds while you’re in an edit-run loop. Release builds default to-O3tuned for your CPU. - Where did the build time go?
HALE_TIME=1 hale build app.hlprints per-phase wall times. - Editor integration:
hale check app.hl --jsonemits one JSON object per diagnostic (file, line, col, severity, message) on stdout.hale checkitself runs in ~10 ms even on large programs, so a save-hook is all an editor needs. - Real debugging: binaries carry DWARF line tables by default —
gdb ./app,break app.hl:42, backtraces with real file:line, and ASAN reports that point at the exact source line. Zero runtime cost; opt out withLOTUS_NO_DEBUGINFO=1.
Build a job queue
In about thirty minutes, you’ll build a small job queue and watch it
descend the four altitudes — from a throwaway script to a service split
across processes — changing almost nothing but main at the very end. The
first three stages run in the browser at the
playground (no install); to follow
along locally, drop each program in a .hl file and hale run it.
We’ll keep the “work” trivial — squaring a number stands in for whatever a real job does — so the shape of the program stays in focus.
1. A job, and the work
Start with the data and the work, as a plain script. A type is pure data;
a fn does something with it.
type Job { id: Int; work: Int; }
fn process(j: Job) -> Int {
return j.work * j.work;
}
fn main() {
let j: Job = Job { id: 1, work: 7 };
println("job ", j.id, " -> ", process(j));
}
job 1 -> 49
This is Hale as a small, clean scripting language — no ceremony, no runtime to think about. One job, processed.
2. A queue that holds the jobs
A queue needs to hold jobs. In Hale a collection is a locus with a @form
annotation — no Vec<T> to import or parameterize. @form(vec) synthesizes
push, get, pop, len, and is_empty on the locus; get/pop are
fallible (out of range), so you address them at the call site with or.
type Job { id: Int; work: Int; }
@form(vec)
locus Queue {
capacity { heap jobs of Job; }
}
fn process(j: Job) -> Int { return j.work * j.work; }
fn main() {
let q = Queue { };
q.push(Job { id: 1, work: 7 });
q.push(Job { id: 2, work: 3 });
q.push(Job { id: 3, work: 9 });
println("queued: ", q.len());
for j in q.items {
println("job ", j.id, " -> ", process(j));
}
}
queued: 3
job 1 -> 49
job 2 -> 9
job 3 -> 81
This is the everyday altitude — loci as plain objects that hold state and expose behavior. Still a single program, run start to finish.
3. Make it a service: the typed bus
A real queue doesn’t drain itself in a loop — work arrives, and workers
react. That’s the typed message bus. Declare the channels as topics, and
wire loci to them: a Worker subscribes to Jobs, does the work, and
publishes a Result; a Reporter subscribes to Results; a Submitter
publishes jobs.
type Job { id: Int; work: Int; }
type Result { id: Int; out: Int; }
topic Jobs { payload: Job; }
topic Results { payload: Result; }
locus Worker {
bus {
subscribe Jobs as on_job;
publish Results;
}
fn on_job(j: Job) {
let out: Int = j.work * j.work;
Results <- Result { id: j.id, out: out };
}
}
locus Reporter {
bus { subscribe Results as on_result; }
fn on_result(r: Result) { println("job ", r.id, " done -> ", r.out); }
}
locus Submitter {
bus { publish Jobs; }
birth() {
Jobs <- Job { id: 1, work: 7 };
Jobs <- Job { id: 2, work: 3 };
Jobs <- Job { id: 3, work: 9 };
}
}
fn main() {
Worker { };
Reporter { };
Submitter { };
}
job 1 done -> 49
job 2 done -> 9
job 3 done -> 81
Run it: this exact program is live in the playground — no install.
Notice what you didn’t write. The Submitter never calls the Worker —
it publishes to a topic, and whoever subscribes gets the message. There’s no
mutex, no channel type to choose, no async/await colouring a single
function. This is the concurrent-services altitude, and the cardinality is
emergent: add a second Worker { }; in main and both receive jobs — the
topic is many-to-many.
So far the bus has been running in-process (the default transport — an in-memory queue). The loci don’t know or care. That’s the seam we pull on next.
4. Deploy it: change only main
The loci above never mention threads or transports. You wire those in
main — placement { } says where loci run, and bindings { } says how
each topic travels. None of the Worker / Reporter / Submitter code
changes; you give them a new main per deployment.
To run the worker as its own process — listening for jobs over a Unix
socket, on its own cooperative pool — that’s a main locus:
// worker.hl — the worker as its own binary. Import the shared Job/Result
// types, the Jobs/Results topics, and the Worker/Reporter loci from §3;
// only this `main` is new.
main locus WorkerNode {
params {
worker: Worker = Worker { };
reporter: Reporter = Reporter { };
}
placement {
worker: cooperative(pool = jobs); // its own pool / OS thread
}
bindings {
Jobs: unix("/run/jobs.sock", role: listen);
}
}
The job source becomes a second binary whose main instantiates the
Submitter and binds the same topic with role: connect
(Jobs: unix("/run/jobs.sock", role: connect);). Same Jobs topic, same
typed payload — now crossing a process boundary instead of an in-memory
queue. Swap unix(...) for udp://host:port or a broker adapter and the
loci still don’t change; only main does. (Add a codec(...) on the
binding to put JSON or protobuf on the wire so a non-Hale peer can read it.)
For the full multi-binary picture — sharing the loci across files, picking transports, and supervising the workers — see Across binaries and Concurrency & placement.
What you built
The same Job / Worker / topic definitions carried you from a script to a
distributed service. Each altitude added exactly what it needed and nothing
more:
| Altitude | What appeared |
|---|---|
| Script | type, fn — data and the work |
| Everyday | a @form(vec) locus that holds the jobs |
| Concurrent | topics + the bus; workers react instead of being called |
| Systems | main chooses placement and transports — the loci untouched |
That last row is the point: a Hale program is a design of loci and topics; where and how it runs is a binding you change in one place. From here, the concurrent services chapters go deeper on lifecycle, failure, and supervision — or open the playground and run the bus version in your browser.
Values & variables
The vocabulary every Hale program is built from.
A variable is introduced with let:
let greeting = "hello";
let count = 3;
let ratio = 0.5;
let ready = true;
Hale infers the type from the value. You can write it explicitly when you want to be sure, or when there’s no value to infer from:
let count: Int = 3;
Immutable by default
A plain let binding can’t be reassigned. To make a variable
you can change, add mut:
let total = 0;
total = total + 1; // ERROR: total is immutable
let mut total = 0;
total = total + 1; // fine
Immutable-by-default is a per-binding property, not a property
of the type. There’s no separate “constant” concept for locals —
let is the constant, let mut is the variable. (Top-level
program constants use const NAME: T = ...; and are written
SCREAMING_SNAKE_CASE.)
Shadowing — declaring a second let x in the same scope — is
not allowed. Pick a new name. The language would rather you say
what you mean than quietly reuse a name for a different value.
The primitive types
These are the scalar types built into the language:
| Type | What it holds | Literal examples |
|---|---|---|
Int | 64-bit signed integer | 0, 42, 1_000_000, 0xFF, 0b1010 |
Float | 64-bit IEEE float | 3.14, 1.0e-3, 2.5 |
Bool | true / false | true, false |
String | UTF-8 text | "hello", "line\n" |
Decimal | exact fixed-point number | 1.50d, 0.00d |
Duration | a span of time | 100ms, 5s, 1h30m |
Time | a wall-clock instant | `2026-05-08T12:00:00Z` |
Bytes | a binary blob | b"\x00\x01\xff" |
Decimal, Duration, and Time are first-class — not strings
you parse, not integers you remember the units of. They get
their own chapter (Math, money & time) because
they’re a real ergonomic upgrade over what most languages give
you.
Underscores in number literals are just for readability
(1_000_000). Integers default to Int, decimals-with-a-point
default to Float; the d suffix makes a Decimal.
Strings
Double-quoted, with the usual escapes (\n, \t, \", \\,
\xNN). Three extra forms:
let raw = r"C:\not\escaped"; // raw — backslashes literal
let multi = """
spans
lines
"""; // triple-quoted
let name = "world";
let hi = f"hello {name}"; // f-string interpolation
An f-string evaluates the expressions inside {...} and renders
them into the text. Use {{ and }} for literal braces.
Printing values
println and print take any number of arguments and
concatenate them. to_string turns a value into text when you
need it as a String:
fn main() {
let n = 41;
println("n + 1 = ", n + 1); // n + 1 = 42
let s = to_string(n + 1); // "42"
println(s);
}
println, print, to_string, and len are builtins —
called as plain functions, not methods. You write len(s), not
s.len(). (Methods with . come later, on loci and your own
types.)
Next: Math, money & time.
Math, money & time
Arithmetic, and three types that save you from classic bugs.
Arithmetic
The operators are what you’d expect:
let a = 7 + 3; // 10
let b = 7 - 3; // 4
let c = 7 * 3; // 21
let d = 7 / 3; // 2 — integer division
let e = 7 % 3; // 1 — remainder
Comparison and logic:
let bigger = a > b; // Bool
let between = a > 0 && a < 100;
let either = ready || forced;
let negated = !ready;
Bitwise operators (& | ^ << >> ~) are available on Int.
Comparisons don’t chain: a < b < c is a parse error — write
a < b && b < c. This is deliberate; chained comparison is a
common source of silent bugs.
Int and Float
Int is 64-bit signed; Float is a 64-bit IEEE double. Hale
widens Int to Float automatically where it’s unambiguous —
at a let with a Float annotation, when passing an Int to a
Float parameter, and when one side of an arithmetic or
comparison operator is a Float:
let x: Float = 3; // 3.0 — widened
let y = 2.0 * 3; // 6.0 — Int 3 promoted to Float
Going the other way loses information, so it’s explicit:
let n = Int(3.9); // 3 — truncates toward zero
When you’d rather name the conversion — or need it mid-expression
where the implicit widening doesn’t reach — std::math has both
directions as functions:
let f = std::math::int_to_float(42); // 42.0
let m = std::math::float_to_int(3.99); // 3 — round toward zero
They’re the same sitofp / fptosi conversions as the casts,
just callable anywhere — so numeric code never has to launder a
value through to_string + parse_float to change its type.
When you want a Float rounded to an Int rather than
truncated — building an integer field out of a Float quantity,
say — reach for round; trunc is the toward-zero sibling:
let a = std::math::round(3.7); // 4 (Int)
let b = std::math::round(2.5); // 3 — half away from zero
let c = std::math::round(0.0 - 2.5); // -3
let d = std::math::trunc(3.7); // 3 — toward zero, like float_to_int
Both return an Int directly. (floor / ceil below return a
Float; wrap them in float_to_int if you need an Int.)
The standard library covers the rest: std::math::sqrt,
exp, log, pow, floor, ceil, the trig functions, and
so on.
Decimal — exact numbers
Float is wrong for money. 0.1 + 0.2 is not 0.3 in any
IEEE-float language, and rounding error compounds. Hale gives
you Decimal: a fixed-point type with exact arithmetic. Write
the literal with a d suffix.
let price = 19.99d;
let qty = 3;
let total = price * 3; // 59.97d — exact, no drift
Use Decimal for prices, balances, quantities, anything where a
penny of rounding error is a bug. Use Float for measurements,
ratios, and math where approximation is fine. The two never mix
implicitly — there is no silent Decimal/Float conversion, so
you can’t accidentally launder exactness away.
Duration — time spans with units
A duration is a length of time, written with a unit suffix:
let timeout = 5s;
let frame = 16ms;
let day = 24h;
let compound = 1h30m; // durations add up
No more “is this milliseconds or seconds?” — the unit is part of the literal. Durations do arithmetic and comparison:
let total = timeout + frame;
if elapsed > timeout { /* ... */ }
This is also what the runtime’s sleep takes:
std::time::sleep(100ms);
Time — wall-clock instants
A Time is a specific instant, written as an ISO-8601 literal in
backticks:
let launch = `2026-05-08T12:00:00Z`;
For measuring elapsed time, reach for the monotonic clock — it never jumps backward when the wall clock is adjusted:
let start = std::time::monotonic(); // a Duration since boot
do_work();
let took = std::time::monotonic() - start;
println("took ", took);
std::time::now() gives wall-clock seconds since the Unix
epoch when you genuinely need calendar time; monotonic() is
the basis for anything timing-related.
Why these are in the language
Decimal, Duration, and Time aren’t library types you opt
into — they’re primitives with their own literals. The reason is
that the bugs they prevent (float drift in money, unit confusion
in time) are so common and so costly that making them
first-class is worth it. You get the safety without importing
anything or remembering a convention.
Next: Functions.
Functions
Naming a piece of work so you can call it.
A function is declared with fn, a name, typed parameters, and
an optional return type:
fn add(a: Int, b: Int) -> Int {
return a + b;
}
fn greet(name: String) {
println("hello, ", name);
}
add returns an Int. greet has no -> T, so it returns
nothing (the unit type, written ()). Parameters are always
typed; there’s no inference at the boundary, because the
signature is the contract.
Call them the obvious way:
fn main() {
let sum = add(2, 3); // 5
greet("world");
}
Returning a value
return expr; hands a value back. A function can also return its
last expression without return if you leave off the trailing
; — the block’s final expression is its value:
fn double(n: Int) -> Int {
n * 2 // no semicolon — this is the return value
}
Both styles are fine. Use whichever reads better; return is
clearer for early exits.
Default parameter values
A parameter can carry a default, so the caller can leave off the trailing arguments:
fn pow(base: Int, exp: Int = 2) -> Int {
let mut acc = 1;
for _ in 0..exp { acc = acc * base; }
return acc;
}
fn main() {
println(pow(3)); // exp defaults to 2 → 9
println(pow(2, 5)); // override → 32
}
Two rules keep the calling convention unambiguous:
- Defaults form a trailing suffix. A required parameter can’t follow a defaulted one — otherwise it wouldn’t be clear which slot an omitted argument fills.
- Defaults are evaluated at the call site, in the caller’s scope — not baked in when the function is defined. For a constant literal (the common case) that’s identical; for an expression that names a caller-visible binding, it sees that binding.
Locus methods support defaults too. One caveat: bus-handler methods and mode methods reject them — their argument shape is fixed by the runtime, so there’s no slot to fill at dispatch time.
Functions are values
A function has a type — fn(Int, Int) -> Int — and you can pass
one as an argument. This is how you hand behavior to another
function:
fn apply_twice(f: fn(Int) -> Int, x: Int) -> Int {
return f(f(x));
}
fn inc(n: Int) -> Int { return n + 1; }
fn main() {
println(apply_twice(inc, 10)); // 12
}
One limit worth knowing now: a function value is just a pointer
to a named function. Hale has no closures — no inline
|x| x + captured that captures surrounding variables. If a
callback needs context, you pass the context in explicitly, or
(at higher levels) you reach for a locus that holds the state.
This keeps every function value a plain, inspectable thing.
Free functions and where they live
A function declared at the top level of a file is a free
function. Every top-level declaration in a directory is visible
to every file in that directory — there’s no import between
files in the same project, and no pub to mark something
exported. You organize by concern, putting related declarations
near each other, not by visibility.
// these two can call each other freely, in either file order
fn celsius_to_f(c: Float) -> Float { return c * 9.0 / 5.0 + 32.0; }
fn f_to_celsius(f: Float) -> Float { return (f - 32.0) * 5.0 / 9.0; }
Free functions are the right tool when an operation has no state of its own — a calculation, a conversion, a parser. When a group of them starts to feel like a coherent vocabulary, the Everyday programs level shows how to gather them onto a locus. For now: a free function per piece of work.
Next: Control flow.
Control flow
Choosing, repeating, and matching.
if / else
if score >= 90 {
println("A");
} else if score >= 80 {
println("B");
} else {
println("C");
}
if is also an expression — it produces a value, so you can
assign with it. In expression position it needs an else, and
both arms must produce the same type:
let grade = if score >= 90 { "A" } else { "B" };
Because an if is an expression, it can be an arm of another
if and the value flows out through both:
let band = if score >= 90 {
if score >= 97 { "A+" } else { "A" }
} else {
"B"
};
One small thing the compiler is strict about: an empty if body
won’t parse. If you genuinely want a branch that does nothing,
put a comment in it or restructure the condition:
if done {
// nothing to do yet
}
while and loop
let mut i = 0;
while i < 5 {
println(i);
i = i + 1;
}
loop { ... } repeats forever until you break:
let mut n = 0;
loop {
n = n + 1;
if n >= 3 { break; }
}
break exits the nearest loop; continue skips to the next
iteration.
for
for iterates over a range or a collection:
for i in 0..5 {
println(i); // 0 1 2 3 4
}
(0..5 is exclusive of the upper bound; 0..=5 includes it.)
You’ll use for over real collections once you meet lists and
maps in Everyday programs.
match
match compares a value against patterns and runs the first
that fits:
fn describe(n: Int) -> String {
return match n {
0 -> "zero",
1 -> "one",
_ -> "many",
};
}
_ is the wildcard — “anything else.” Matches must be
exhaustive: the compiler rejects a match that doesn’t cover
every possibility. For a Bool that means both true and
false; for open-ended types it means a _ arm. This is a
safety feature — you can’t forget a case and have it silently
fall through.
match shines on enums (a type that’s one of several named
shapes), which you’ll meet in
Records & data. The arms can bind the
data carried by each variant.
Blocks have values
A { ... } block’s last expression — written without a trailing
; — is the block’s value. That’s why if/match can be used
as expressions, and why a function can end in a bare expression
instead of return. A block whose last item does end in ;
has value ().
let label = {
let base = compute();
base + 1 // block evaluates to this
};
That’s the whole control-flow surface. Next we look at working with text: Strings & text.
Strings & text
Building and inspecting text.
Joining
The + operator concatenates strings, and println /
f-strings join for you:
let first = "Ada";
let last = "Lovelace";
let full = first + " " + last;
let hi = f"hello, {first}";
println("full name: ", full);
to_string(x) converts a number, bool, duration, etc. into its
text form when you need a String specifically:
let n = 42;
let label = "n=" + to_string(n);
Length and inspection
len(s) is a builtin — the byte length of the string:
let s = "hello";
println(len(s)); // 5
Most text operations live in std::str, called as plain
functions:
let i = std::str::index_of("hello world", "world"); // 6
let sub = std::str::substring("hello world", 0, 5); // "hello"
let up = std::str::upper("hi"); // "HI"
let t = std::str::trim(" spaced "); // "spaced"
let r = std::str::replace("a-b-c", "-", "+"); // "a+b+c"
Hale has no per-character method syntax (s.charAt(i)); you
slice with a range or use the std::str helpers. Slicing a
string by byte range:
let s = "hello";
let h = s[0..1]; // "h"
Parsing numbers
Turning text into a number can fail — the text might not be a number. So the parse functions are fallible, and the next chapter (When a call can fail) is exactly about how you handle that. The shape, previewed:
let n = std::str::parse_int("42") or 0; // 42, or 0 if it wasn't
There are also non-failing predicates to check first
(std::str::can_parse_int) when you’d rather branch than
recover.
Bytes
Text is String; raw binary is Bytes. They’re different types
because they have different rules — a String is valid UTF-8, a
Bytes is any sequence of octets, including embedded zeros.
let b = std::bytes::from_string("hello"); // String -> Bytes
let s = std::str::from_bytes(b); // Bytes -> String
let byte0 = std::bytes::at(b, 0) or 0; // a single byte (fallible)
You’ll work with Bytes directly when you read from a socket or
a file and need to frame messages yourself — that’s a topic for
wire formats and the systems tier. At
this level, just know the two types are distinct and you convert
explicitly between them.
Next: the failure model — When a call can fail.
When a call can fail
Hale’s value-level error model — and why you can’t ignore it.
Some calls can’t always succeed. Parsing "banana" as an
integer, reading a file that isn’t there, connecting to a host
that’s down. In Hale these calls have a type that says so, and
the compiler requires you to deal with the failure right at the
call site. There are no exceptions, no surprise control flow, and
no silently-ignored error codes.
The fallible type
A function that can fail declares it with fallible(E), where
E is the type of the error payload:
type ParseError { kind: String; input: String; }
fn parse_count(s: String) -> Int fallible(ParseError) {
if !std::str::can_parse_int(s) {
fail ParseError { kind: "not_int", input: s };
}
return std::str::parse_int(s) or 0;
}
fail <payload>; exits the function through the error path,
carrying the payload. The function’s result is now “either an
Int, or a ParseError” — and the caller can’t just use it as
an Int:
let n = parse_count(input); // ERROR: error not addressed
You have to address the error. You do that with an or
clause.
The five or motions
let a = parse_count(s) or raise; // propagate upward
let b = parse_count(s) or 0; // substitute a value
let c = parse_count(s) or handle(err); // hand off to a helper
let d = parse_count(s) or fail OtherErr { }; // translate the error
some_unit_call() or discard; // ignore (unit result only)
or raise— pass the error up to your caller. Your function must itself befallible(E)with a compatible error type, so the error has somewhere to go.or <expression>— substitute a fallback value of the success type. Inside the expression,erris bound to the payload, so you can inspect it:let port = std::str::parse_int(arg) or 8080;or handler(err)— call a function that takes the error and returns the success type. Good when several call sites share one recovery policy.or fail <payload>— fail with a new error of your own type, instead of forwarding the inner one. Use it so a library doesn’t leak a stdlib error type through its own surface.or discard— throw the error away. Only allowed when the successful result is()(nothing to substitute). The compiler rejectsor discardon a value-bearing call and suggestsor <fallback>instead.
A real example
Reading a file is fallible — the file might not exist:
fn load_greeting() -> String {
return std::io::fs::read_file("welcome.txt") or "(no welcome)";
}
If the read fails, we substitute a default. If instead we wanted
the failure to stop us, we’d make load_greeting fallible and
or raise:
fn load_greeting() -> String fallible(...) {
return std::io::fs::read_file("welcome.txt") or raise;
}
Chaining
or clauses chain right-to-left — each one disposes of one
failure:
let id = parse_count(primary) or parse_count(fallback) or 0;
“Try the primary; if that fails, try the fallback; if that fails, use 0.”
Why it works this way
This is the only failure channel you need at the basics level,
and it has a single rule: every fallible call is addressed at
the immediate call site. That means when you read a function
body, every place that can fail is visibly marked with or. No
error propagates invisibly through three stack frames; no try
wraps a whole block in ambiguity.
There’s a second failure channel for a different situation — a
long-running component whose internal invariant breaks, where the
right response is a supervisor’s policy rather than a return
value. That’s the structural channel, and it belongs to the
services tier (When things fail). For
everything you’ll write at this level, fallible + or is the
whole story.
Next, we put the pieces together: Your first program.
When the handler can fail too
A recovery handler is often itself a fallible operation — read a fallback file, query a secondary source. Since 2026-07-02 you can write that directly:
fn load(primary: String, backup: String) -> String fallible(IoError) {
return std::io::fs::read_file(primary)
or (std::io::fs::read_file(backup) or raise);
}
If the backup read succeeds, its value substitutes. If it also
fails, or raise routes the error out through YOUR function’s
error path — which is why load must itself be fallible with a
compatible error type.
For your own fallible functions the inner or raise is implicit —
db_read(k) or self.rebuild(k) propagates the handler’s failure
automatically. Stdlib calls and @form methods used as handlers
still need the explicit nested spelling above (the compiler will
tell you, with the exact rewrite, if you forget).
Your first program
Everything from this level, in one small CLI.
Let’s build a complete little command-line tool using only what the basics covered: variables, math, functions, control flow, and the fallible model. It converts a Celsius temperature passed on the command line into Fahrenheit.
fn c_to_f(c: Float) -> Float {
return c * 9.0 / 5.0 + 32.0;
}
fn main() {
// arg(0) is the program name; arg(1) is the first real argument.
let raw = std::env::arg_or(1, "20");
let celsius = std::str::parse_float(raw) or {
eprintln("not a number: ", raw);
return;
};
let f = c_to_f(celsius);
println(raw, "C = ", to_string(f), "F");
}
Run it:
hale run temp.hl 100
100C = 212F
With no argument it falls back to "20" and prints 20C = 68F — the tool self-demonstrates.
What each piece is doing
std::env::arg_or(1, "20")reads command-line argument 1, or"20"if there isn’t one. (std::env::args_count()andstd::env::arg(i)are the lower-level pair.)std::str::parse_float(raw) or { ... }addresses the fallible parse. Here theorarm prints to standard error and returns early — a fine motion when the success type is a value but you’d rather bail than substitute. (eprintlnisprintlnfor stderr.)c_to_fis a plain free function — a calculation with no state, exactly what free functions are for.println(raw, "C = ", to_string(f), "F")concatenates its arguments. No format string.
This is a real program
You can hale build temp.hl and ship the resulting binary. It
reads input, validates it, computes, and reports — and it’s
honest about failure, because the parse had to be addressed.
At this level Hale is a small, sharp scripting language.
You may have noticed there’s no locus here, no bus, none of
the structural machinery from the introduction’s matchmaker. You
don’t need it yet. A program that’s a handful of functions and a
main is a perfectly good Hale program.
The next level is where structure starts to pay off — when your program grows state that lives over time, talks to the filesystem and the network, and wants to be organized into named parts. That’s where the locus earns its place.
Next: The locus, gently.
The locus, gently
Coming from Python / Node? A
locusis the closest thing Hale has to a class or a module. It bundles state (fields) with behavior (methods) and you make instances of it. There’s no separate “module” and “class” — one construct plays both roles. This chapter only uses the object-like 80%; the lifecycle and messaging parts wait until you need them.
In the basics, a program was functions and a main. That’s
fine until you have state that lives over time — a counter, a
cache, a configuration, a connection — or until a pile of free
functions wants a name to live under. That’s what a locus is for.
A locus with state
locus Counter {
params {
count: Int = 0;
}
fn bump() {
self.count = self.count + 1;
}
fn value() -> Int {
return self.count;
}
}
params is the locus’s state — typed fields, each with a default.
Inside any method, self.field reads and writes that state.
Methods are fns, called with .:
fn main() {
let c = Counter { }; // make one; count defaults to 0
c.bump();
c.bump();
println(c.value()); // 2
}
You construct a locus with Name { ... }, overriding any field
you like:
let c = Counter { count: 10 };
If you’ve used objects before, this is familiar: params are
the instance variables, methods are the methods, Counter { }
is the constructor. Hale collapses “constructor parameters” and
“instance fields” into one params block — the same way Ruby’s
@foo or Python’s self.foo are just attributes.
type vs locus
You met type for plain records earlier. The line between them:
typeis pure data — a record you construct, pass around by value, and read. No methods, no state that changes itself, no lifecycle.locusis data with behavior and identity — it has methods, it mutates its own state, and (at the next level) it can run over time and send messages.
type Point { x: Int; y: Int; } // just data
locus Tally { // data + behavior
params { total: Int = 0; }
fn add(n: Int) { self.total = self.total + n; }
}
These aren’t rival categories — they’re points on a gradient. A
type is a locus that hasn’t grown behavior yet. When a record
starts accumulating methods, you promote it from type to
locus. There is no third thing to reach for.
Two everyday shapes
Almost every locus you write at this level is one of two shapes.
The app locus — the outer wrapper for a whole program. Your
main reads arguments and hands off to it:
locus App {
params { name: String = "world"; }
fn run() {
println("hello, ", self.name);
}
}
fn main() {
let app = App { name: std::env::arg_or(1, "world") };
app.run();
}
This replaces the bare-main-with-helpers shape from the basics:
the app’s top-level state and entry point now have a home. (At
the services level, run() becomes a special lifecycle method
the runtime drives — but as an ordinary method it already works.)
The namespace lotus — a home for a coherent vocabulary of helpers, with little or no state. Hale’s stand-in for a “module of functions” or a static class:
locus Temps {
fn c_to_f(c: Float) -> Float { return c * 9.0 / 5.0 + 32.0; }
fn f_to_c(f: Float) -> Float { return (f - 32.0) * 5.0 / 9.0; }
}
fn main() {
let t = Temps { };
println(t.c_to_f(100.0)); // 212
}
You instantiate it once and dispatch through it. When three or more related free functions show up, this is usually the tidier home for them.
A rule worth meeting early
Hale has one structural commitment that shapes everything above:
Every named piece of state belongs to exactly one locus.
No globals, no shared mutable buffer that nobody owns, no “floating” value passed around by side channel. If you’re not sure where some state should live, the productive question is “which locus owns this?” — and there’s almost always a clean answer. This is what lets Hale clean up memory and coordinate failure without a garbage collector; you’ll see the payoff at the systems level. For now it’s just good hygiene: put state where it belongs.
Next: the collections you’ll reach for constantly — Lists & maps.
Lists & maps
Coming from Python / Node? Hale has no built-in
list/[]that grows, nodict/{}, noVec<T>orMap<K,V>. Instead you declare a small locus and annotate it with a form —@form(vec)for a growable list,@form(hashmap)for a keyed map. You get the same operations (push,get,len,set, …); they’re just methods on a locus you named.
A growable list — @form(vec)
@form(vec)
locus Names {
capacity { heap items of String; }
}
fn main() {
let names = Names { };
names.push("Ada");
names.push("Grace");
println(names.len()); // 2
let first = names.get(0) or ""; // "Ada"
}
Three things are happening:
@form(vec)tells the compiler “this locus is a growable list.” It synthesizes the methods for you:push,get,set,pop,len,is_empty, and sorting.capacity { heap items of String; }is where the list’s storage lives. Read it as “this list holdsStrings.” The element type comes from here.getandpopare fallible — an index might be out of bounds — so you address them withor, just like any fallible call:let x = names.get(99) or "(missing)";
Iterate with for over the items:
for name in names.items {
println(name);
}
(The indexed while i < names.len() + .get(i) walk also works,
and is what you want when you need the index — but prefer .items
as the default: it reads better and, on hashmaps especially, it’s
dramatically faster. A hashmap walk via key_at(i)/entry_at(i)
rescans from slot 0 on every call — O(cap×len) for the whole walk —
while for e in m.entries visits each occupied slot once.)
The element type can be anything — a primitive, or one of your
own type records:
type Player { id: String; score: Int; }
@form(vec)
locus Roster {
capacity { heap players of Player; }
}
A keyed map — @form(hashmap)
A map keys entries by a field on the value itself — the key is
one of the record’s fields, named with indexed_by:
type Account { user: String; balance: Int; }
@form(hashmap)
locus Accounts {
capacity { pool entries of Account indexed_by user; }
}
fn main() {
let accts = Accounts { };
accts.set(Account { user: "ada", balance: 100 });
accts.set(Account { user: "grace", balance: 250 });
let a = accts.get("ada") or Account { user: "", balance: 0 };
println(a.balance); // 100
println(accts.has("grace")); // true
}
set(value)takes the whole record and reads the key out of itsindexed_byfield — there’s no separate key argument.get(key)andremove(key)are fallible (the key might be absent);has(key)returns a plainBool.- Keys are
IntorString.
This “the key is a field of the value” shape matches how keyed stores almost always look in practice — you rarely have a key that isn’t already part of the thing you’re storing.
A bounded queue — @form(ring_buffer)
When you want a fixed-size FIFO that drops the oldest entry once it’s full (recent-events buffers, sliding windows):
@form(ring_buffer, cap = 64)
locus Recent {
capacity { pool events of String; }
}
push returns a Bool — false when the buffer is full — so
you decide whether to drop or apply backpressure. pop is
fallible on empty.
A list inside a type — bounded[T; N]
The forms above are loci — whole entities with their own
lifecycle. A type is pure data, so it can’t hold one. What it CAN
hold (since 2026-07-02) is a bounded collection — a
fixed-capacity list laid out inline in the value:
type Message {
id: String;
tags: bounded[String; 32];
}
fn main() {
let msg = Message { id: "msg1" }; // tags starts empty —
// bounded fields can't be
// spelled in a literal
push(msg.tags, "urgent") or raise;
push(msg.tags, "billing") or raise;
for tag in msg.tags {
println(tag);
}
println(count(msg.tags)); // 2
}
Six operations, all compiler intrinsics (types stay method-free,
like len(s)):
push(f, x)— append; fallible withCapacityError { cap, count }when full. What to do at capacity is your policy, written in theorarm.at(f, i)— read sloti; fallibleIndexErrorout of range.set(f, i, x)— overwrite a live slot; fallibleIndexError.count(f)— the live count (the capacity lives in the type).clear(f)— reset to empty.truncate(f, n)— shrink the count (never grows); withset, this is the drop-front idiom for FIFO windows.
Use bounded when the maximum is known and the list is a field of
a value — per-message tags, route parameters, a chat window. The
old workaround (a tab-separated string you re-parse on every read)
is retired: pond’s router, LLM, and conversation libraries all
migrated. Whole-struct copies carry the elements automatically, and
scalar-element bounded values even cross the zero-copy bus as flat
bytes.
Why a form instead of a generic type
A list isn’t just “a type parameterized by its element” — it’s a
bundle of decisions: contiguous memory, dynamic length, who owns
the storage, what happens to it when the owner goes away. A form
makes those decisions at the declaration, and picks an
implementation tuned for the element type. The upshot for you at
this level is simple: @form(vec) is your list, @form(hashmap)
is your map. The reasoning behind forms — and how to choose
between them on performance grounds — is in Forms under the
hood at the systems level.
One form per locus: a locus is a list or a map, not both. If you need both, that’s two loci — which is usually what the data wanted anyway.
Next: Records & data.
Records & data
Coming from Python / Node? Where you’d reach for a dict or an object literal to pass structured data around, Hale uses a named
type— a fixed-shape record with typed fields. It’s closer to a TypeScriptinterface/ a Python@dataclassthan to a free-form dict: the shape is declared, and the compiler checks it.
Records — type
type Player {
id: String;
name: String;
score: Int;
}
Construct with a struct literal, naming each field:
let p = Player { id: "p1", name: "Ada", score: 0 };
println(p.name); // field access with .
Records are pure data: you pass them by value, read their fields, and compare them. They carry no behavior and no lifecycle. Fields can have defaults, so callers can omit them:
type Config { host: String = "127.0.0.1"; port: Int = 8080; }
let c = Config { port: 9000 }; // host defaults
Records nest, and they’re what travels on the bus and in and out of functions. When a record starts wanting methods, that’s the signal to promote it to a locus.
Arrays
A fixed sequence of one type is an array. [T] is a slice (a
view of some elements); [T; N] is a fixed-length array:
type Match { players: [Player]; } // a slice of Players
let xs = [1, 2, 3]; // an array literal
let zeros = [0; 8]; // eight zeros
For a sequence that grows, you want a @form(vec) list from
the previous chapter, not a bare array.
Tuples
A quick, unnamed grouping of a few values:
let pair = (1, "one");
Reach for a type once the grouping has meaning worth naming;
tuples are for the throwaway case.
Enums — one of several shapes
An enum is a value that is exactly one of a set of named variants — a tagged union / sum type:
type Light = enum { Red, Yellow, Green };
fn next(l: Light) -> Light {
return match l {
Light::Red -> Light::Green,
Light::Green -> Light::Yellow,
Light::Yellow -> Light::Red,
};
}
Construct a variant with EnumName::Variant, and use match to
branch on it — exhaustively, so you can’t forget a case.
Variants can carry data:
type Event = enum {
Tick(Int),
Trade(Decimal, Int),
Halt,
};
fn handle(e: Event) {
match e {
Event::Tick(0) -> println("tick zero"),
Event::Tick(n) -> println("tick #", n),
Event::Trade(price, size) -> println("trade ", size, " @ ", price),
Event::Halt -> println("halt"),
}
}
The match arms bind the payload — Tick(n) pulls the integer
out as n. You can also match a literal sub-pattern (Tick(0))
ahead of the general one. This is the idiomatic way to model
“the message is one of these kinds, each with its own data” —
and it pairs naturally with the typed bus at the next level.
Enums fill the role of
Option<T>/Result<T, E>from other languages when you want a closed set of outcomes as data. For the “this call failed” case specifically, prefer thefalliblechannel — it’s the purpose-built tool and the compiler enforces handling.
Next: reading and writing the world — Files.
Files
Coming from Python / Node? No
try/except, no.catch(), no checkingerr != nil. Every filesystem call that can fail returns afalliblevalue, and the compiler makes you address it withorright where you call it. The failure is visible at the call site, always.
Reading and writing
fn main() {
// Write a file (creating or truncating it).
std::io::fs::write_file("greeting.txt", "hello\n") or raise;
// Read it back. read_file returns the whole contents as a String.
let body = std::io::fs::read_file("greeting.txt") or "(empty)";
println(body);
}
For main to use or raise, main would need to be fallible;
more often at the top level you substitute or report:
fn main() {
let body = std::io::fs::read_file("config.toml") or {
eprintln("no config; using defaults");
return;
};
use_config(body);
}
The surface
All of these live under std::io::fs and all are
fallible(IoError) except file_exists:
| Call | Does |
|---|---|
read_file(path) -> String | whole-file read |
read_bytes(path) -> Bytes | whole-file read, binary |
write_file(path, contents) | create / truncate |
write_file_append(path, contents) | append |
file_size(path) -> Int | size in bytes |
mkdir(path) | create a directory |
rename(from, to) | move / rename |
unlink(path) | delete |
mktemp(prefix) -> String | make a temp file |
list_dir(path) -> ... | enumerate entries |
file_exists(path) -> Bool | test (never fails) |
The error tells you what happened
When a call fails, the IoError payload carries a kind
(String), the raw errno (Int), and the path (String).
kind is a stable tag derived from the OS error —
"not_found", "permission_denied", "already_exists",
"is_dir", and so on. So you can branch on the kind of
failure without parsing error strings:
fn handle_io(e: IoError) -> String {
if e.kind == "not_found" {
return ""; // treat missing as empty
}
eprintln("io error on ", e.path, ": ", e.kind);
return "";
}
fn load(path: String) -> String {
return std::io::fs::read_file(path) or handle_io(err);
}
This is the or handler(err) motion from the basics, put to
work: one recovery function shared across every read.
Idempotent setup
or discard is handy for “make sure this exists; don’t care if
it already did” — it’s allowed because the result type is ():
std::io::fs::mkdir("cache") or discard;
Held-open files
read_file / write_file are whole-file, one-shot. When you
want a file handle you read from incrementally — line by line,
or seeking around — use std::io::file::File, a locus that holds
the open descriptor for its lifetime:
let f = std::io::file::open("log.txt", "r") or raise;
let line = f.read_line() or "";
// ... f closes when it goes out of scope
That “closes when it goes out of scope” is the locus lifecycle
quietly at work — f owns the descriptor and releases it when
its binding’s scope ends. You’ll see that mechanism in full at
the services level; here it just means you don’t write a manual
close.
Next: structured data on disk and the wire — JSON.
JSON
Coming from Python / Node? There’s no
JSON.parsethat hands you a dynamic object you index freely. Hale’sstd::jsonis field-oriented: you ask a JSON string for a named field and a type (find_string_field,find_int_field, …), and you build output with a streamingBuilder. At v1 it’s tuned for flat objects and arrays — the common shapes for config and wire messages.
Reading
Pull individual fields out of a JSON string by name:
let doc = "{\"name\": \"Ada\", \"age\": 36, \"active\": true}";
let name = std::json::find_string_field(doc, "name"); // "Ada"
let age = std::json::find_int_field(doc, "age"); // 36
let active = std::json::find_bool_field(doc, "active"); // true
Missing fields come back as the type’s zero value ("", 0,
false) rather than failing — so for “is this really present?”
semantics, check with the raw accessor or validate upstream.
find_field_raw returns the raw substring for a field, which is
how you reach into a nested object:
let inner = std::json::find_field_raw(doc, "address");
let city = std::json::find_string_field(inner, "city");
Parsing into a type
Pulling fields one by one rescans the document per field. When you have a fixed shape, tag the fields with their JSON keys and the compiler generates a single-pass parser for you:
type Order {
id: Int `json:"id"`;
price: Int `json:"px"`; // JSON key differs from the field name
qty: Float `json:"sz"`;
active: Bool `json:"on"`;
side: String `json:"side"`;
currency: String = "USD"; // optional: default fills a missing key
}
let o = Order::from_json(body) or raise;
println(o.price);
Type::from_json(s) -> Type fallible(JsonError) walks the object once,
dispatches each key to the matching field, and reads the value by the
field’s declared type — no per-field rescan, and unmatched keys (and
nested objects/arrays under them) are skipped. The json:"<key>" tag
sets the JSON key; without it the field name is the key.
A missing field raises JsonError, naming the field — unless the
field declares a default (currency: String = "USD"), in which case the
default fills it. Because from_json is fallible, you must address it
(or raise, or <fallback>, …) like any other fallible call.
A field whose type is another json:-tagged struct is parsed
recursively — nest as deep as you like, and a missing field anywhere
raises with that field’s name:
type Addr { city: String `json:"city"`; zip: Int `json:"zip"`; }
type Person { name: String `json:"name"`; home: Addr `json:"home"`; }
let p = Person::from_json(body) or raise;
println(p.home.city);
The same tags drive the reverse direction — Type::to_json(value)
serializes back to a JSON string (numbers and bools bare, strings escaped,
nested structs recursed), so from_json / to_json round-trip:
let body = Order::to_json(o); // -> {"id":7,"px":...}
let o2 = Order::from_json(body) or raise;
to_json is not fallible — serialization always succeeds.
The tag is general key:"value" metadata — json: is one consumer;
other keys are free for future tools.
Fields must be scalars — Int / Float / Bool / String — or nested
json:-tagged structs. Array fields are not supported, by design:
Hale sequences are locus-owned (there is no heap-owning value list to put
in a struct). To read a JSON array, walk it with the array
cursor and push each element into a @form(vec) cell on a
locus — from_json handles the flat/nested record shape, arrays stay an
explicit, locus-owned step.
Arrays
Walk a JSON array with the iterator pair:
let arr = "[10, 20, 30]";
let mut it = std::json::array_first(arr);
while !it.done {
let n = std::str::parse_int(it.element) or 0;
println(n);
it = std::json::array_next(it);
}
array_first returns an iterator with the first element and a
done flag; array_next advances it.
Writing
The Builder is a streaming assembler — it tracks open scopes
and inserts separators for you, so you can’t produce malformed
JSON by forgetting a comma:
let b = std::json::Builder { };
b.begin_object();
b.field("name", "Ada");
b.int_field("age", 36);
b.bool_field("active", true);
b.end_object();
let out = b.result(); // {"name":"Ada","age":36,"active":true}
Nest objects and arrays by pairing begin_* / end_*. String
values are escaped per the JSON spec automatically; if you need
to escape or unescape a string by hand, std::json::escape_string
and unescape_string are there.
When the shape is deep
std::json at v1 is built for flat objects and top-level arrays
— the great majority of config files and API messages. For
deeply-nested documents you walk level by level with
find_field_raw, treating each nested object as its own flat
document. If you’re parsing a genuinely complex or
performance-critical format, the wire-format
techniques and the systems-tier
performance chapter cover building
your own parser over Bytes.
Next: serving and calling over the network — HTTP.
HTTP
Coming from Python / Node? This is your Flask / Express moment — but instead of decorators or a routes table, you write a handler locus: a locus with a
handle(req) -> Responsemethod.std::http::Serverruns the accept loop and calls your handler per request. Routing is amatchon the path insidehandle. (A fuller router with path params lives in thepondlibrary catalog.)
A server
locus Api {
params { hits: Int = 0; }
fn handle(req: std::http::Request) -> std::http::Response {
if req.path == "/health" {
return std::http::Response {
status: 200, body: "ok\n", content_type: "text/plain"
};
}
self.hits = self.hits + 1;
return std::http::Response {
status: 200,
body: f"hello — hit #{self.hits}\n",
content_type: "text/plain"
};
}
}
fn main() {
let server = std::http::Server { port: 8080, handler: Api { } };
// Server runs its accept loop until the process is stopped.
}
hale build it, run it, and curl localhost:8080/health. The
handler’s params persist across requests — self.hits counts
them — because the Api locus is alive for the whole run.
The pieces
std::http::Requestcarriesmethod,path,version,body, and headers (looked up case-insensitively). Youmatch/ifonmethodandpathto route.std::http::Responseneeds at leaststatusandbody;content_typedefaults totext/plain, and you can add customheaders.std::http::Servertakes aportand ahandler, then owns the listen-accept-parse-dispatch loop.max_accepts: Nbounds it to N requests (handy for tests); the default runs until stopped.
A first taste of interfaces
How does Server know Api is a valid handler? Server’s
handler field has the type std::http::Handler, which is an
interface — a named set of required methods:
// (declared in the standard library)
interface Handler {
fn handle(req: Request) -> Response;
}
Any locus that has a matching handle method satisfies
Handler — automatically, with no implements clause to write.
This is structural satisfaction: the shape is the contract. You
declared Api with the right method, so it’s a Handler. (Go
programmers will recognize this; it’s interfaces without the
impl ceremony.)
Calling out
The standard library ships the server. For an HTTP client —
making outbound requests, with connection pooling and TLS — reach
for the http/client library in pond:
import "vendor/pond/http/client" as http;
// let resp = http::get("https://example.com") or raise;
That import line, the bindings that wire a server across
processes, and the lifecycle that lets a server shut down cleanly
on Ctrl-C are all next-level topics — but the handler you wrote
above doesn’t change when you get there. The server code is
already complete; the surrounding tier just gives it more ways to
be deployed and supervised.
Next: the transports below HTTP — UDP & TLS.
UDP & TLS
HTTP covers the request/response server. Below it sit
two more transports: UDP for connectionless datagrams, and
TLS for an encrypted client connection. Both are thin wrappers
over the platform sockets — each call returns or takes a file
descriptor (an Int).
For ordinary TCP request/response, prefer
std::httpor thestd::io::tcpListener/Streamloci. This chapter is the raw-datagram and TLS-client surface.
UDP datagrams — std::io::udp
Bind a socket, then send and receive datagrams. bind and the I/O
calls are fallible(IoError):
let fd = std::io::udp::bind("0.0.0.0", 9000) or raise;
std::io::udp::send(fd, "127.0.0.1", 9001, "ping") or raise;
To receive and learn who sent it, use recv_with_source and read
the thread-local source cache immediately after:
let msg = std::io::udp::recv_with_source(fd, 1500) or raise; // Bytes
let host = std::io::udp::last_source_host();
let port = std::io::udp::last_source_port();
println(host, ":", to_string(port), " sent ",
to_string(len(msg)), " bytes");
std::io::udp::close(fd);
Datagram boundaries are preserved — one send is one recv.
Delivery is best-effort; layer acknowledgement or retry on top if
you need it. Multicast is a join_group away (set_multicast_ttl
/ set_multicast_loop tune it), and set_recv_timeout(fd, 100ms)
bounds a quiet recv.
UDP as a bus transport. The raw socket above is not the typed bus. To carry bus messages over UDP, use the
udp://host:portsubstrate transport instead (see the bus) — same dispatch contract asunix://.
TLS client — std::io::tls
connect does the TCP connection and the TLS 1.2+ handshake (SNI
- system trust store) in one call, via the platform OpenSSL:
let h = std::io::tls::connect("example.com", 443) or raise;
std::io::tls::send_bytes(h, std::bytes::from_string(
"GET / HTTP/1.0\r\nHost: example.com\r\n\r\n"));
let resp = std::io::tls::recv_bytes(h, 4096); // Bytes
println(std::str::from_bytes(resp));
std::io::tls::close(h);
This is client-side only — there is no TLS server in the
stdlib. set_recv_timeout(h, d) bounds a read; with one set,
recv_into returns the -2 “timed out, retryable” sentinel so a
long-lived client can run keep-alive work instead of hanging.
Tuning sockets — std::io::sockopt
The UDP set_option_int / set_option_bool / get_option_int
calls take a level and name from std::io::sockopt’s named
constants, so you never hardcode a platform number:
std::io::udp::set_option_bool(
fd, std::io::sockopt::SOL_SOCKET(),
std::io::sockopt::SO_REUSEADDR(), true) or raise;
For TCP, std::io::tcp::set_nodelay(fd, true) is the common one
(disable Nagle for latency).
Next: hashing & encoding — Hashing, encoding & randomness.
Hashing, encoding & randomness
A grab-bag of the cryptographic and byte-wrangling helpers real
programs reach for: digests, message authentication, base64, and
random numbers. They live under std::crypto, std::text, and
std::rand / std::os.
Hashes & checksums — std::crypto
Digests take Bytes and return Bytes:
let data = std::bytes::from_string("hello");
let key = std::bytes::from_string("secret");
let digest = std::crypto::sha256(data); // 32 bytes
let tag = std::crypto::hmac_sha256(key, data); // 32 bytes
let sum = std::crypto::crc32(data); // Int (IEEE 802.3 / zlib)
let digest512 = std::crypto::sha512(data); // 64 bytes
let tag512 = std::crypto::hmac_sha512(key, data); // 64 bytes
sha1 (20 bytes) is there too for legacy interop; reach for
sha256 by default. The 64-bit-word SHA-512 siblings —
sha512 / hmac_sha512 (64-byte) — are the same non-fallible
shape, for venues that sign with HMAC-SHA512 (e.g. Kraken,
Gate.io). The hashes and crc32 are hand-rolled — no OpenSSL
dependency.
Raw hash bytes aren’t printable, so encode one to show or transport it:
println(std::text::base64::encode(digest));
Base64 — std::text::base64
let enc = std::text::base64::encode(data); // Bytes -> String
let dec = std::text::base64::decode(enc); // String -> Bytes
let url = std::text::base64::url_encode(data); // URL-safe, unpadded
url_encode is RFC 4648 §5 (the -_ alphabet, no = padding) —
the form JWTs, OAuth, and webhook signatures use. decode accepts
both alphabets.
Signing — ECDSA P-256 (ES256)
For JWT / venue auth, std::crypto ships ECDSA over NIST P-256
with SHA-256 (the ES256 JWS algorithm), OpenSSL-backed:
// key: PEM EC private key (SEC1 or PKCS#8); message: Bytes
let sig = std::crypto::ecdsa_p256_sign(key, message) or raise;
// pubkey: PEM SPKI; sig: raw r‖s, 64 bytes (the JWS/COSE form)
let ok = std::crypto::ecdsa_p256_verify(pubkey, message, sig);
ecdsa_p256_sign has two faces: a bare call returns an empty
Bytes on failure (check len(sig) == 0), and in an or context
it is fallible(CryptoError), so or raise / or fail err /
or handle(err) propagate a structured CryptoError { kind, detail } like any other error.
Random numbers — std::rand and std::os
std::rand is a fast, non-cryptographic PRNG — fine for
jitter, sampling, shuffling, game logic:
let roll = std::rand::next_int(6) + 1; // a die roll, [1, 6]
For anything security-sensitive (tokens, nonces, keys), use the CSPRNG instead:
let nonce = std::os::getrandom(16) or raise; // 16 random Bytes
Next: reading configuration — CLI & config.
CLI & config
Coming from Python / Node? No
argparse, noyargs, nodotenv. Reading arguments and environment is a few direct calls understd::env; layering argv over env over defaults is a smallstd::cli::Resolver. Rich flag parsing (--name=value, subcommands) is library territory, not built into the language.
Arguments and environment
fn main() {
let n = std::env::args_count(); // includes the program name
let first = std::env::arg(1); // positional arg 1
let port = std::env::arg_or(2, "8080"); // with a default
let home = std::env::var("HOME"); // environment variable
let debug = std::env::var_exists("DEBUG");
}
arg(0)is the program name; user arguments start atarg(1).arg_or(i, default)is the everyday form — no bounds-checking dance.var(name)reads an environment variable;var_exists(name)tests for one.
Layered configuration
A common need: a setting should come from a command-line argument
if given, else an environment variable, else a built-in default.
std::cli::Resolver expresses that precedence directly:
fn main() {
let cfg = std::cli::Resolver { prefix: "MYAPP" };
// argv positional "port", else $MYAPP_PORT, else "8080"
let port = cfg.get("port", "8080");
let host = cfg.get("host", "127.0.0.1");
println("listening on ", host, ":", port);
}
The resolver checks the argument, then the prefixed environment
variable (MYAPP_PORT), then the supplied default. Empty values
fall through to the next layer rather than counting as “set.”
Interactive terminal I/O
For a tool that draws to the terminal or reads keystrokes, a few
std:: primitives cover the OS surface without an FFI dependency.
std::term::is_tty(fd) answers “is this a terminal?” — the usual
guard for whether to emit color:
let color = std::term::is_tty(2); // fd 2 = stderr
std::term::size() returns a TermSize { cols, rows } record (and
{0, 0} when stdout isn’t a tty). std::term::RawMode is a guard
locus that puts the terminal in raw mode for its lifetime — no line
buffering, no echo — and restores it on scope exit, and on a panic
or unhandled error too via an atexit backstop:
fn main() {
let raw = std::term::RawMode { }; // birth: enter raw mode
// ... read keys, draw frames ...
} // dissolve: restore the terminal
For the bytes themselves, std::io::stdin::read_byte(timeout_ms)
polls one byte (0..255, -1 on timeout, -2 on EOF), and
std::io::stdout::write_bytes(s) does a raw, unbuffered write — it
fflushes first so it stays ordered with any println output:
loop {
let b = std::io::stdin::read_byte(100); // 100ms poll
if b == -1 { continue; } // timeout: redraw, tick, …
if b == -2 { break; } // EOF
std::io::stdout::write_bytes("got a key\r\n");
}
These are primitives, not a TUI — key decoding and styling live in a library on top of them.
Where this fits
This is the boundary between the outside world and your program.
The idiomatic shape, building on the app
locus: main resolves configuration, then
constructs the app locus with it.
locus App {
params { host: String = "127.0.0.1"; port: String = "8080"; }
fn run() { println("listening on ", self.host, ":", self.port); }
}
fn main() {
let cfg = std::cli::Resolver { prefix: "MYAPP" };
let app = App {
host: cfg.get("host", "127.0.0.1"),
port: cfg.get("port", "8080"),
};
app.run();
}
Configuration enters once, at the edge, and flows inward as typed locus state — never read again from a global deep inside the program. That keeps every setting owned by exactly one locus, the rule from The locus, gently.
Next: seeing what your program is doing — Logging.
Logging
Coming from Python / Node? Instead of a global
loggerobject you configure at import time, Hale logging is built on the message bus: aLoggerpublishes typed log events, and a sink subscribes to them and decides what to do (print, write a file, ship to a collector). It’s your first look at the bus — the mechanism the whole next tier is built on.
The minimal setup
Two pieces: something that emits log events, and something that consumes them.
fn main() {
// The sink must exist before anything logs to it.
let sink = std::log::StdoutSink { };
let log = std::log::Logger { name: "app" };
log.info("starting up");
log.warn("disk almost full");
log.error("connection refused");
}
StdoutSink subscribes to all log events and prints them;
Logger emits them. The ordering matters — instantiate the sink
first, because a subscriber has to exist before a publisher
sends, or the early events have nowhere to go.
Levels
Loggers carry the usual levels: trace, debug, info,
warn, error. Call the matching method:
log.debug(f"cache size = {n}");
log.error(f"request {id} failed: {reason}");
Per-component loggers, one sink
Each Logger has a name, which becomes the event’s topic
(log.app, log.db, log.http). You can give every component
its own named logger and still have a single sink see everything:
fn main() {
let sink = std::log::StdoutSink { };
let app_log = std::log::Logger { name: "app" };
let db_log = std::log::Logger { name: "db" };
app_log.info("ready");
db_log.warn("slow query");
}
A custom sink subscribes to a subtree — log.db.** to capture
only database logs, log.** to capture all of them — without the
loggers knowing who’s listening. Publisher and subscriber never
reference each other; they only share the topic name.
You just used the bus
That decoupling — emitters publish, sinks subscribe, neither
holds a reference to the other — is the bus, Hale’s typed
publish/subscribe channel. Logging is a small, friendly instance
of it: Logger publishes a LogEvent on a topic, StdoutSink
subscribes. The same mechanism carries any typed message between
any two loci in your program.
At this level you’ve used the bus without declaring one. The
services tier makes it first-class: you
declare your own topics, subscribe and publish them in a
locus’s bus { } block, and use them to wire concurrent
components together. Everything you just saw — emit, subscribe to
a subtree, no direct references — is exactly how it works at
scale.
That’s the everyday tier. With loci, collections, files, JSON, HTTP, config, and logging, you can build real applications — CLIs, web services, data tools. The next tier is for programs that run over time and coordinate: long-lived services, a typed bus you design, concurrency, and supervision.
Next: The lifecycle.
The lifecycle
Coming from Go? A long-running locus is like a goroutine with structure: instead of
go func(){...}()and acontextyou thread around for cancellation, a locus has named lifecycle methods the runtime drives —birth → run → drain → dissolve— and shutdown cascades through the tree automatically. You write the phases; the runtime sequences them.
Until now, loci have been object-like: state plus methods you call. A locus can also run over time. When it does, it moves through a fixed sequence of lifecycle states, and the runtime guarantees the ordering.
The five phases
locus Server {
params { listen_fd: Int = -1; }
birth() { /* acquire: open sockets, files, buffers */ }
run() { /* steady-state work — the main loop */ }
drain() { /* stop taking new work; finish in-flight */ }
dissolve() { /* release what birth acquired */ }
}
birth()runs once, at construction, after the locus’s state is initialized. Acquire resources here — open a socket, read a file, allocate a buffer. By the time it returns, the locus is live.run()is the steady-state body — typically a loop that serves requests, drains a queue, or ticks on a timer. It runs until it returns on its own or the locus is asked to shut down.drain()runs when shutdown begins: stop accepting new work, let in-flight work finish.dissolve()runs last: release whatbirthacquired. The locus’s memory is freed wholesale right after.
There’s also accept and release for parent/child
relationships — those belong to Parents &
children. And on_failure for
recovery — When things fail.
You only write the phases you need; the compiler supplies no-op
defaults for the rest. A locus with just birth and run is
completely normal.
One rule: no
returninsidebirth/run/dissolvebodies. These are driven by the runtime, not called by you, so “return a value” has no meaning. Factor any early-exit logic into a helper free function the body calls.
A simple service
locus Ticker {
params { count: Int = 0; limit: Int = 5; }
run() {
while self.count < self.limit {
println("tick ", self.count);
std::time::sleep(500ms);
self.count = self.count + 1;
}
}
}
fn main() {
Ticker { limit: 3 }; // runs to completion, then tears down
}
When does a locus dissolve?
This is the one piece of bookkeeping worth internalizing,
because it’s how Hale frees resources without a defer or a
finally:
- Statement position (
Ticker { };— no binding): the locus runs its whole lifecycle right there and tears down at the end of the statement. Fire-and-forget. let-bound (let t = Ticker { };): it’s born and runs, but dissolve is deferred to the end of the enclosing function’s scope. The binding stays usable for method calls until then.- Long-lived (the locus subscribes to the bus, or its
run()hasn’t returned): it stays alive until its scope exits, regardless of binding — it has to, to keep receiving messages.
So let keeps a locus alive for the scope; statement position is
fire-and-forget. When several let-bound loci share a scope,
they dissolve in reverse order of creation (the later one, which
may depend on the earlier, goes first).
Replacing a locus held in a field
If a locus holds another locus in a field — say a server that
keeps its current connection in self.conn — assigning a fresh
one replaces a live thing, so it’s a lifecycle event, not a
plain store:
self.conn = Connection { url: next }; // reconnect
Hale tears the old self.conn down first (drain → dissolve, so
its socket and any children are released), then builds the new
one into this locus’s arena and points the field at it. The old
and new never overlap, and the new instance lives until the
parent dissolves — no manual close, no leak. This is
break-before-make: if you need make-before-break (hold the old
connection open while the new one warms up), keep both in
separate fields and swap explicitly.
To reconfigure the same instance instead of replacing it, mutate
in place — self.conn.url = next; — which keeps the connection
and triggers no teardown.
Shutdown cascades
drain() is always depth-first cascading. Calling it on a
locus first drains all of its children (and theirs, recursively),
waits for them, then drains itself, then dissolves. You never
write a manual teardown walk.
This is what makes Ctrl-C trivial: SIGINT calls drain() on the
program’s root, the whole tree winds down in dependency order,
in-flight work finishes, resources release, the process exits
cleanly. “Press Ctrl-C and it shuts down properly” is the
default, not something you wire up.
The lifecycle is the skeleton of every long-running Hale program. Next, the thing those programs use to talk to each other: The bus.
The bus
Coming from Go? Topics are like channels, but typed by a declaration instead of by
chan T, and many-to-many instead of point-to-point. You don’t pass a channel into a goroutine; a locus declares which topics it subscribes to and publishes, and the runtime wires the delivery. No channel plumbing threaded through constructors.
You met the bus implicitly in logging: emitters publish, sinks subscribe, neither references the other. Here you declare and use it directly.
Topics are typed declarations
A topic names a channel and the type that flows on it:
type Order { id: String; amount: Decimal; }
topic OrderPlaced { payload: Order; }
topic OrderShipped { payload: Order; }
A topic is a top-level declaration, like type or locus. It’s
referenced by name — never a magic string — so the payload type
is checked at every publish and every handler, and renaming the
topic moves every use with it.
Subscribe and publish
A locus declares its bus interface in a bus { } block:
locus Warehouse {
bus {
subscribe OrderPlaced as on_order; // inbound
publish OrderShipped; // outbound
}
fn on_order(o: Order) {
// ... pick and pack ...
OrderShipped <- o; // the send
}
}
subscribe TOPIC as HANDLER;wires inbound messages to a handler method. The handler must exist with the matching signature —fn on_order(o: Order)— and the compiler checks it.publish TOPIC;authorizes this locus to send on the topic. Without it, a send is a compile error.TOPIC <- value;is the send. It’s a statement, not an expression — it produces no value, like Erlang’sPid ! Msg.
Subscribing is declarative — there’s no subscribe() call at
runtime. Registration happens when the locus is constructed, and
unsubscribe happens automatically at dissolve.
One ordering rule
A subscriber must be born before a publisher sends, or the
message has nowhere to land. In practice: instantiate your
subscribers first in main. (This is the same rule you saw with
the log sink.)
Why this doesn’t break the tower
In the parent/child model, flow is strictly vertical — a locus only talks up to its parent and down to its children. The bus seems to let unrelated loci talk sideways. It doesn’t, really: publishers and subscribers don’t see each other, they see the topic, which lives at the runtime root — structurally above everyone. Every send goes up to the bus; every delivery comes down to a subscriber. It’s vertical flow through a shared root, which is why two loci on opposite branches of a deep tree can coordinate with no shared pointer and no registry lookup.
This is the productive shape for events: many-to-many flow without back-channels. A topic can have any number of publishers and subscribers.
You won’t always pay for it
If a topic is only ever used inside a single locus type — the same locus both publishes and subscribes, with no external binding — the compiler can prove every send routes back to a handler on the same instance, and rewrites the send into a direct method call. The bus is elided entirely. So you can use topics freely for a locus’s own internal event flow without paying dispatch cost; if the topic later grows a second subscriber or a deployment binding, the real bus path comes back automatically, and your code doesn’t change.
As of v0.9.0 the static-dispatch devirtualization is broader than that intra-locus-type case: any quiet, flat-payload, same-thread handler on a closed-world local subject lowers to a direct synchronous call — even when the publisher and subscriber are distinct locus types.
Routing keys: one topic, sharded by a field
By default every subscriber to a topic sees every message. When you have many subscribers that each care about one slice of the traffic — one connection, one symbol, one tenant — fanning every message to all of them and filtering in each handler is wasteful. A routing key moves that filter into the bus: a subscriber declares which key it wants, and the runtime only delivers matching messages.
Name a payload field as the key on the topic, then filter on it at the subscribe site:
type Tick { symbol_id: Int; price: Decimal; }
topic Quote { payload: Tick; keyed_by symbol_id; }
locus Feed {
params { symbol_id: Int = 0; }
bus {
subscribe Quote as on_quote where key == self.symbol_id;
}
fn on_quote(t: Tick) {
// only ticks whose symbol_id matches this Feed arrive
}
}
A publish carries its key in the payload, so the send is
unchanged — Quote <- Tick { symbol_id: 7, price: 100.0d };
reaches only the Feed instances that subscribed with
where key == 7.
keyed_by FIELDon the topic picks the routing field. It must be a field of the payload, and its type must be one the bus can hash to a fixed-width key:Int,Bool,Time,Duration, a no-payload enum, orDecimal. (Need a compound key like(symbol, venue)? Pack it into oneDecimalfield yourself.)where key == EXPRon a subscribe filters that subscriber.EXPRcan be a literal, aconst, orself.<field>— the common case, one instance per shard.- The key is captured by value when the locus is constructed.
Reassigning
self.symbol_idlater does not re-route the subscription; to change shards, dissolve the locus and instantiate a fresh one.
When nothing matches
A keyed publish whose key matches no subscriber is governed by the
topic’s on_unmatched: policy:
topic Quote { payload: Tick; keyed_by symbol_id; on_unmatched: fallback; }
swallow(the default) — the message is dropped silently. Run withLOTUS_BUS_LOG_UNMATCHED=1to log drops while debugging.fail— the publish becomes fallible; every send site must dispose of it:Quote <- t or raise;panics on an unmatched key,Quote <- t or discard;swallows it. Use this when an unrouted message is a bug, not an expected case.fallback— an unmatched message is delivered to a catch-all subscriber that opts in withwhere key == _. At least one such subscriber must exist program-wide, or the topic is rejected at compile time.
Next: where loci actually run — Concurrency & placement.
Concurrency & placement
Coming from Go? Concurrency isn’t
go f()scattered through the code. Loci run concurrently by default; where each one runs — a shared cooperative pool (like a scheduler’s worker) or its own dedicated OS thread — is declared in one place, theplacement { }block onmain. It’s a deployment decision, not something baked into the locus. And there’s noasync/await: the lifecycle and the bus already give you what coloring functions would.
Two ways a locus can run
Hale’s concurrency is deliberately bimodal — two choices, no third:
- Cooperative — the locus shares an OS thread with other
cooperative loci on the same pool. It yields between units of
work (after a handler, on a bus dispatch, on
time::sleep, on an explicityield). Handler bodies run to completion without interruption, so within one cooperative locus there’s no data race to worry about. This is the default. - Pinned — the locus owns its own OS thread and doesn’t yield to neighbors. For latency-critical or CPU-bound work that shouldn’t share.
Long sleeps don’t freeze the pool
A cooperative pool runs one locus at a time, so a locus that sits
in a long time::sleep could, in principle, starve every other
locus sharing its pool — a 30-second keep-alive timer on the
main pool would block bus handlers for 30 seconds. It doesn’t.
std::time::sleep slices any sleep into short intervals (≤100ms)
and drains the pool’s pending bus work between slices, so
neighbors keep getting dispatched while one locus naps:
run() {
while true {
self.send_heartbeat();
std::time::sleep(30s); // sliced — co-resident handlers
// still fire every ≤100ms
}
}
The sleeping locus still wakes after the full duration; it just
doesn’t hold the thread hostage in the meantime. You write
sleep(30s) and the slicing is invisible — there’s nothing to
opt into. (A pinned locus owns its thread, so its sleeps affect
no one and aren’t sliced.)
Placement lives on main
You declare placement once, against the top-level loci, in
main:
main locus App {
params {
gateway: Gateway = Gateway { };
metrics: MetricsServer = MetricsServer { port: 9100 };
ui: Renderer = Renderer { };
}
placement {
gateway: pinned(core = 1); // own thread, pinned to core 1
metrics: cooperative(pool = io); // shares the "io" pool
ui: cooperative(pool = render);
// anything unlisted defaults to cooperative(pool = main)
}
}
cooperative(pool = X)puts the locus on poolX’s thread. The runtime spawns one OS worker per pool name it sees.pinned/pinned(core = N)gives the locus its own thread, optionally pinned to a CPU core.- Unmentioned top-level loci default to
cooperative(pool = main)— the program’s main thread.
Placement keys on the field name, not the locus type, so two instances of the same locus type can live on different threads — the parallelism case (one gateway per core, say).
Why on main and not on the locus? Because where something runs
is a property of the deployment, not the code. The same
Gateway locus is pinned in production and cooperative in a
test, with no edit to Gateway itself. Library authors say what
a locus is; the binary author says where it runs.
Nested loci inherit their pool
Placement entries apply only to top-level main loci. A locus
instantiated inside another locus’s body runs on its parent’s
pool. To put a component on its own pool, hoist it to a top-level
sibling in main and give it a placement entry. (This is the
canonical fix for “my long-running child starved its parent” —
make it a sibling, not a nested child.)
This inheritance is also how you co-locate work on a pinned
thread. There’s no pinned(pool = X) for sharing a pinned
thread — pinned owns its thread exclusively. So when a pinned
locus needs helpers on its thread (counters, a metrics registry, a
signal store — anything it calls directly), you nest them: make
them params of the pinned locus, and they inherit its thread.
Param defaults make this ergonomic — a default can itself
instantiate the helper:
locus Gateway { // placed pinned in main
params {
reg: Registry = Registry { };
ticks: metrics::Counter = metrics::counter(self.reg, "ticks");
}
// run() calls self.ticks.inc() etc. — all on the pinned thread
}
Hoisting them to siblings instead would put them on a different thread, and the gateway calling them directly would then be a cross-pool method call — which the compiler rejects (see below). Nesting is the supported pattern for “many loci, one pinned thread.”
The bus crosses threads for you
When a cooperative locus on one pool publishes to a subscriber on
another pool — or to a pinned locus on its own thread — the
runtime handles the hand-off: it copies the payload across the
thread boundary and wakes the destination. The sender never
blocks. From your code’s point of view, Topic <- value; is the
same line whether the subscriber is on the same thread or a
different one. The substrate adapts; the source doesn’t.
High-concurrency I/O: where async_io
A single pinned thread handles one blocking connection at a time.
To serve many concurrent connections on one thread without a
thread-per-connection explosion, tag a cooperative pool with
where async_io:
placement {
workers: cooperative(pool = ws) where async_io;
}
The pool’s worker runs an event loop (epoll under the hood), and
blocking I/O calls inside loci on that pool — recv, accept,
send — park and resume instead of holding the thread. Your
locus code stays synchronous-shaped: stream.recv(4096) is the
same call either way; the substrate picks the parking lowering at
the syscall boundary. This is how you get async-style throughput
without async-style function coloring.
The compiler checks your placement
Two placement mistakes are caught for you, because both the placement and the locus’s shape are known at compile time:
- A subscriber that blocks its own delivery is an error. A
cooperative locus on a non-
mainpool receives bus cells fine as long as its pool thread is free to run the dispatch — an event-driven subscriber (handlers plus asleeploop, orwhere async_io) works. But if such a subscriber’srun()makes a blocking call, it monopolizes the pool thread, the dispatch never runs, and its handlers never fire. That combination — non-maincooperative subscriber with a blockingrun()— is the error; the compiler points you atpinned(own thread + mailbox) or keepingrun()non-blocking. (Placement alone is fine; it’s the blocking call that kills delivery.) - A blocking call on a cooperative pool is a warning. Even when
the locus isn’t a subscriber, a blocking
run()(a blockingrecv/accept, a subprocessrun) on a pool that isn’twhere async_ioholds the pool’s thread and stalls everything else scheduled there. The compiler warns and suggestspinned(own thread) orwhere async_io(parks). For blocking I/O gateways,pinnedis the prescribed shape. This warning follows the call graph: arun()that blocks indirectly — through a helper fn or aself.methodit calls — is flagged too, naming the offending call. (The dead-receiver error above stays direct-call-only, so it never widens onto an indirect path.) - An orphan bus topic is a warning. In a complete program (one
with a
mainlocus), a topic or subject wired to only one end — published with nobody subscribed, or subscribed with nobody publishing — is flagged, as is a declared topic used by neither. It’s suppressed when the other end is plausibly external: a transportbinding, a wildcard (log.**) covering the subject, a cross-seed (alias::Topic) reference, or the same locus being both ends. Library code (nomain) isn’t checked — its peers live downstream. - A bus cycle is flagged. If a handler for one topic publishes
another in a loop (
a → b → a), the cell can re-trigger its own publish. A cycle across loci spins the cooperative queue — a warning. A cycle within one locus is worse: intra-locus publishes are direct synchronous calls, so the loop recurses on the thread until the stack overflows — an error. (Only an unconditional self-republish errors; one guarded by anifis a terminating state machine and is left alone.) - An unthrottled publish loop is a warning. A
while trueloop that publishes with noyield,time::sleep/tick, input-pacingrecv, orbreak/returnfloods the bus — the producer has no backpressure, so cells pile up without bound. Pace the loop, drive it from an input, oryieldto let the subscriber drain. (Bounded loops are never flagged; any flow-control point clears it.) - A subject payload type-mismatch is an error. If two sites
publish/subscribe the same literal subject string with different
of typepayloads, a subscriber would decode the wrong type at runtime — rejected. (Declaredtopics are already unified by their declaration, so this only affects ad-hoc literal subjects.)
It also enforces the single-threaded-method invariant: a locus’s
methods may only be called on the thread that owns its pool, so a
direct method call across pools (self.other.foo() where other
is placed on a different pool) is a compile error — it would run
other’s method on the wrong thread.
One escape is deliberately not traced: a call made through a
handler function pointer rather than a direct method reference —
the canonical case being a std::http::Server handler that reads a
locus living on another pool. The static call-graph walk can’t see
through the pointer, so it’s allowed. That’s load-bearing (it’s how
a /metrics endpoint on the io pool reads a registry nested on a
pinned gateway), but it’s on you to keep that access safe —
typically a read of stable, append-only state, not a mutation that
would race the owning thread.
Next: how loci nest and own each other — Parents & children.
Parents & children
Coming from Go? This is structured concurrency — closer to an
errgroupor a supervised tree than to bare goroutines. A parent locus accepts child loci; the children live inside the parent’s scope, the parent sees their progress through a typed contract, and when the parent shuts down its children shut down first. No detached goroutine outliving the thing that spawned it.
A parent accepts children
A locus declares it can parent a child type by implementing
accept:
locus GameSession {
params { players: [Player]; tick: Int = 0; }
}
locus Room {
accept(g: GameSession) {
// runs before g's region is allocated — the gatekeeper.
// return normally to admit; route through on_failure to reject.
}
fn on_join(p: Player) {
// instantiating a child inside a parent method attaches it
GameSession { players: [p] };
}
}
When GameSession { ... } is evaluated inside Room’s body, the
runtime runs Room.accept(g) first, then allocates the child’s
region inside the parent’s, then births and runs it. The
parent’s self.children holds its accepted children (with
self.children.count and self.children.is_empty for quick
summaries).
Bubbling: the nearest accepting ancestor collects the child
accept isn’t limited to direct children. If you instantiate a
child where the enclosing locus doesn’t accept its type, the child
doesn’t become a detached throwaway — it bubbles up to the
nearest ancestor that does accept it.
locus World {
accept(s: Ship) { } // a top-level registry of ships
}
locus Fleet {
fn spawn() {
Ship { hull: 100 }; // Fleet doesn't accept Ship...
} // ...so this Ship bubbles to World
}
World collects every Ship spawned anywhere beneath it — through
a Fleet that never mentions ships — with no manual registration.
It’s the structural counterpart to the bus: the bus
carries ephemeral messages; this carries ephemeral ownership —
a live collection the ancestor holds and cleans up.
A few rules keep it predictable:
- Nearest wins. If several ancestors accept the type, the innermost one gets the child. A direct parent that accepts it is the nearest of all — so nothing about ordinary parent/child attachment changes; bubbling only fills the gap where a child had no owner.
- No owner is fine. A child whose type no ancestor accepts is
just a transient local — bubbling is opt-in via
accept, and the absence of an owner is never an error. - Still vertical. Bubbling travels up the tower to an ancestor; it never reaches sideways. The child’s region still lives inside its owner’s, so the whole “flow is vertical only” cleanup story holds — the owner is just possibly a grandparent, not always the direct parent.
When the owner lives on a different thread — a main locus
registry collecting entities that workers spawn on their own pools —
the child is created over on the owner’s thread, so the spawning
side can’t hold onto it. There a cross-pool spawn is
fire-and-forget: write it as a bare statement, not
let s = Ship { ... }. The compiler will tell you if you try to
keep the value.
The contract: what crosses the boundary
A child decides what its parent may see by declaring a
contract:
locus GameSession {
params { tick: Int = 0; state: SessionState; }
contract {
expose tick: Int; // parent may read this
expose state: SessionState;
consume clock: Time; // parent must provide this
}
}
locus Room {
contract { consume clock: Time; }
accept(g: GameSession) {
if g.tick > 1000 { /* ... */ } // reading an exposed field
}
}
expose is what the child lets the parent read; consume is
what the child needs the parent to provide. Anything not in the
contract is invisible across the boundary — the compiler rejects
reads of un-exposed fields. You don’t write hiding logic; the
structural boundary does it.
Flow is vertical only
The rule the whole tower rests on: a locus talks up to its
parent and down to its children — never sideways. Two sibling
sessions don’t reference each other; if they need to coordinate,
they route through their shared parent (the Room is exactly the
place that should know how sessions relate), or over the
bus. No sibling pointer, no cousin back-channel.
This is what makes cleanup sound: a child’s memory is a sub-region of its parent’s, no pointer ever crosses sideways, so when a locus dissolves its whole subtree frees wholesale — no garbage collector, no per-object bookkeeping.
Flow children vs residents
Here’s the piece that matters for any long-running parent — a server that accepts one child per connection. By default an accepted child lives until its parent dissolves. For a daemon whose parent never dissolves, that means per-connection children pile up forever. Two shapes fix it:
locus Conn {
params { conn_fd: Int = -1; }
run() {
let stream = std::io::tcp::Stream { conn_fd: self.conn_fd, owns_fd: false };
loop {
let chunk = stream.recv(4096);
if len(chunk) == 0 { return; } // client closed → run() ends
// ... handle chunk
}
}
}
locus Server {
accept(c: Conn) { }
release(c: Conn) { } // ← declaring release marks Conn a *flow*
}
- Declaring
release(c: Conn)on the parent marksConna flow: itsrun()is its lifetime. Whenrun()returns (the recv loop ends on close), the runtime reclaims the child right then — drains it, calls the parent’sreleasefor a final look, dissolves it, frees its region — while the server keeps running. The connection’s memory ends with the connection. - A child no parent
releases is a resident: itsrun()returning means “ready,” and it lives until the parent dissolves. That’s the right shape for a fixed cohort of long-lived workers spun up at boot. - A locus can also end itself early with
terminate;— the locus analogue ofreturn. It exits the method and lets the runtime tear the locus down.
The same “run() returned” event means “reclaim me” for a flow
and “I’m ready” for a resident — disambiguated by whether the
parent declared release, never guessed. If you accept a child
per connection and memory climbs with connection count, you have
a resident that should be a flow.
Next: what happens when a child breaks — When things fail.
When things fail
Coming from Go? This is the part that’s more Erlang than Go. Alongside the value-level
falliblechannel you already know, a long-running locus has a structural failure channel: when an invariant it promised to keep breaks, the failure flows up to its parent, which decides recovery — restart, quarantine, or escalate. Supervisors, let-it-crash, and typed recovery policy, built into the language.
Two channels, on purpose
Hale keeps two failure mechanisms strictly separate:
- The value channel —
fallible(E)+or, from the basics. “This call didn’t produce a value; the caller decides what to do.” Routes up the call stack, addressed inline. - The structural channel — a locus’s declared invariant
breaks, the runtime builds a typed event and routes it up the
locus tower to the parent’s
on_failure. “A promised property no longer holds; the supervisor decides.”
There’s no panic, no assert, no exceptions. Every legitimate
failure is one of these two, and they only meet at the program’s
root.
Declaring an invariant: closure
A closure is a property a locus promises to keep, checked by
the runtime at a declared moment:
locus Account {
params { debits: Decimal = 0.00d; credits: Decimal = 0.00d; }
closure balanced {
self.debits ~~ self.credits within 0.01d;
epoch tick;
}
}
~~ is “approximately equal, within tolerance.” The epoch
says when to check — tick (each event-loop iteration), birth,
dissolve, duration(1m), or inline (only when fired by
hand). If the assertion holds, nothing happens; closures are
silent on success. If it breaks, the runtime constructs a typed
ClosureViolation and routes it to the parent’s on_failure.
Handling failure: on_failure
The parent is the supervisor. It decides policy per child type:
locus Bank {
accept(a: Account) { }
on_failure(a: Account, err: Error) {
match err {
Error::ClosureViolation(v) -> quarantine(a) for 60s,
_ -> bubble(err),
}
}
}
The recovery primitives:
- absorb — just return; the failure is noted and contained.
restart(child)— dissolve and re-create it fresh.restart_in_place(child)— reset it, keeping its region.quarantine(child) for d— pause it, preserving state for inspection, optionally auto-restarting afterd.bubble(err)— pass it up to this locus’s parent.dissolve(child)— force it down.
If a failure bubbles past the root with no one absorbing it, the process exits non-zero with a structured report. That’s the only way a Hale program “crashes” — and it’s a deliberate, typed event, not a surprise. This is Erlang’s let-it-crash, but the recovery policy is typed and written next to the locus it governs.
Crossing from value to structural
Sometimes a method catches a value-level error and decides it’s
fatal — the right move is to stop this locus and let the
supervisor take over. You bridge with an inline closure and the
violate statement:
locus DbConnection {
params { last_error: String = ""; }
closure fatal_io { captures: last_error; epoch inline; }
// an error-check fn: takes the error, returns the success type,
// and either substitutes a value or escalates.
fn handle_io(e: IoError) -> Row {
self.last_error = e.kind;
if e.kind == "broken_pipe" {
violate fatal_io; // diverges — escalate structurally
}
return Row { data: "" }; // transient — substitute and continue
}
fn on_query(q: Query) {
let r = send_query(self.conn_fd, q) or self.handle_io(err);
if !self.draining { QueryResult <- r; }
}
}
closure fatal_io { ... epoch inline; }is a named structural failure with no assertion — it only fires when you say so. Thecaptures:clause snapshots locus state into the violation payload.violate fatal_io;fires it. It’s divergent (theNevertype, likefailandbubble), so the branches that violate need noreturn. The locus enters drain at the next yield; the parent’son_failuregets the typed violation with the captured state.self.drainingis a Bool every locus can read — true once it’s decided to wind down. Use it to stop publishing after the decision.
That’s the canonical “catch an error and shut this locus down”
shape: one closure, one error-check method, one violate. You
don’t reach for a hand-rolled should_exit flag and a polling
loop — these primitives are the supported form.
Next: splitting a program across processes — Across binaries.
Across binaries
Coming from Go? Splitting a program into services usually means rewriting in-process calls as RPC or queue clients. In Hale the publisher and subscriber code doesn’t change — a topic that was an in-process queue becomes a Unix socket or a broker by adding one line to
main’sbindings { }block. The deployment seam is the only place that knows.
A topic is in-process by default
When a topic isn’t mentioned in any bindings { } block, it’s
delivered by an in-process cooperative queue. Two loci in the
same binary just talk. Nothing to configure.
Binding a topic to a transport
To carry a topic between binaries, name it in the main
locus’s bindings { } block with a transport:
main locus App {
bindings {
MatchReady: unix("/tmp/matches.sock");
}
run() {
Matchmaker { target_size: 4 };
}
}
bindings { } is legal only on a main locus. The publisher’s
MatchReady <- info; and the subscriber’s subscribe MatchReady as ... are unchanged — they don’t know or care that delivery
now crosses a socket. The same locus source runs in a test
(in-memory), a single binary (in-memory), and a multi-binary
deployment (unix), chosen entirely at this seam.
The transports that ship
- In-process — the default; absence of a binding.
unix("/path")— an AF_UNIX framed-byte transport, owned by the runtime. The role (listen vs connect) is inferred from whether the binary publishes or subscribes the topic; specifyrole: listen | connectwhen one binary does both.udp://host:port— datagram transport, including IPv4 multicast. Lossy by nature — right for tick streams and telemetry where stale-is-worthless.- A user adapter — any locus you write that satisfies the
__StdBusAdapterinterface (a singlesend(subject, bytes)method). This is how NATS, MQTT, a raw-TCP framing, or a custom JSON-over-WebSocket transport plug in — as ordinary loci in your code, not language features:bindings { BrokerEvt: MyNatsAdapter { url: "nats://prod:4222" }; }
The substrate stays neutral on protocol semantics — reliability, ordering, retries, backpressure all live in the adapter body, where they belong.
Talking to other languages: codecs
By default the bus uses Hale’s internal wire format, which is
fine Hale-to-Hale but opaque to a consumer in another language.
When you need JSON over a socket or protobuf to a Python peer, a
binding names a codec — a locus that owns encode/decode:
bindings {
Tick: unix("/tmp/ticks.sock") codec(TickJsonCodec { });
}
The codec is structurally typed against the topic’s payload
(encode takes the payload type, decode returns it) and must
be pure — no hidden state — because it runs on transport
threads. Different bindings on the same topic can carry different
codecs; the publisher’s send site doesn’t know which.
The shape this gives you
A single source tree, decomposed into loci that coordinate over
topics. How those topics are delivered — same process, same
machine over a socket, across the network via a broker — is a
deployment decision living in bindings { }, separate from the
logic. You design the system once and deploy it many ways. The
systems tier adds one more
transport for the highest-frequency same-machine routes:
shared-memory zero-copy.
That’s the services tier: lifecycle, a typed bus, concurrency and placement, supervised parent/child trees, structural failure, and multi-binary deployment. You can build daemons, servers, and distributed systems with this. The final tier goes under the runtime — memory, layout, raw performance, and the C boundary — for when you need that control.
Next: Memory & lifetime.
Composition patterns
The shape catalog names the six building blocks — app locus, namespace lotus, service locus, spawned child, shape type, free fn. This chapter is the next layer up: five compositions of those blocks that recur in real Hale services, distilled from production use. Reach for one of these when a problem feels like it needs a new language feature — usually it doesn’t, it needs one of these shapes.
1. The three-locus gateway
The canonical answer to “I have N dynamic, keyed children with their own lifecycles” (and to the rejection of putting loci in a hashmap):
pinned reader ──▶ cooperative manager ──▶ keyed per-entity child
(owns the fd, (accept()s a child (subscribe ... where
publishes events) per new key) key == self.id)
- A pinned locus owns the blocking input (socket, ring) on its own thread and publishes decoded events onto the bus.
- A cooperative manager subscribes to “new entity” events and
accept()s one child per key. Declarerelease(c: Child)so each child is reclaimed when its flow ends (otherwise it’s a resident and lives until the manager dissolves — unbounded on a daemon). - Each child subscribes with a key filter (
subscribe Update as on_update where key == self.id) so the bus routes only its own entity’s messages to it.
This gives you per-entity state and lifecycle without a map of loci — the bus is the routing table, keyed.
2. Demand-driven discovery
A special case of the gateway with zero hardcoded topology: the
manager doesn’t know its children up front. A subscription triggers
the accept():
// manager
bus { subscribe "entity.first_seen" as on_seen of type Seen; }
fn on_seen(s: Seen) {
// First message for this key → spawn its child now.
// Bare instantiation inside a parent method attaches the child:
// it triggers the enclosing accept(c) gatekeeper. `accept` is a
// lifecycle hook the runtime invokes, never a method you call.
Child { id: s.id };
}
The topology grows from the data. Combined with release, children
appear on first contact and vanish when their flow ends — the
process shape mirrors the live workload with no configuration. (If
the manager doesn’t itself accept this child type, the child
bubbles to the nearest accepting ancestor
— v0.9.2.)
3. Hot-path counters & gauges (and the CQRS rejection)
You will want to write let n = self.metrics.incr("hits") on a hot
path. Hale rejects locus methods that return locus values
(GH #18.6 / the “CQRS” shape) — a method call that hands back a live
locus reference breaks the closed-world ownership the substrate
relies on. The rejection without a replacement strands you, so here
is the migration:
- Pre-allocated handles at boot. Declare the counter/gauge loci
as
paramsof the owner, instantiated once at birth. The hot path mutates a field in place (self.hits = self.hits + 1) — no method returning a locus, no per-call allocation. - Bus-routed single-writer store. For shared metrics, publish a
MetricUpdate { name, delta }to a single collector locus that owns the store and applies updates in its handler. One writer, no contention, and the closed-world rewrite keeps the publish synchronous. This is the shapepond/metrics’MetricsCollectoruses.
Either way the hot path does an in-place field write or a publish — never a method that returns a locus.
4. The publish-policy gate
When you produce data faster than you want to publish it (telemetry,
book snapshots), gate the publish behind a tick() with a
time-or-volume trigger rather than publishing per-update:
fn on_update(u: Update) {
self.pending = self.pending + 1;
self.acc = self.acc + u.delta; // accumulate in place
if self.pending >= 100 { self.flush(); } // volume trigger
}
fn tick() { // time trigger (scheduled)
if self.pending > 0 { self.flush(); }
}
fn flush() {
"snapshot" <- Snapshot { total: self.acc };
self.pending = 0;
}
The accumulation is in-place; only the flush crosses the bus. This keeps the high-frequency path allocation-free and bounds publish volume independently of input volume.
5. View lifetime — copy out to persist
The zero-copy span/JSON APIs (StringView, BytesView,
std::json::*_span) hand you a view into a buffer you don’t own.
That view is valid only until the next operation that overwrites the
buffer — the next recv, the next ring read. Holding it across that
boundary reads freed/overwritten memory:
let name = std::json::find_string_field(msg, "name"); // view into recv buf
self.read_msg(); // ← overwrites the buffer
println(name); // ✗ dangling view
The rule: a view is valid until the next recv/overwrite; copy out to persist. Materialize it before the boundary:
let name = std::str::clone(std::json::find_string_field(msg, "name"));
self.read_msg();
println(name); // ✓ owns its own copy
Forgetting this is now panic-guarded (a stale-view access exits with a diagnostic rather than reading garbage), so you’ll see a clear “view used after its buffer was overwritten” message instead of a silent corruption — but the fix is always to clone out before the overwriting call.
Memory & lifetime
Coming from Rust / C++? No garbage collector, and no borrow checker either. Memory is region-based: every locus owns an arena, allocations inside it are bump-pointer cheap, and the whole region frees in one shot when the locus dissolves. The locus tree is the ownership graph — so lifetimes are structural, not annotated. You never write
free, and you never fight a borrow checker, because no pointer ever crosses sideways.
You’ve used loci for pages without thinking about memory, because the model is automatic. Here’s what’s underneath.
A locus owns a region
Every locus has an arena — a region of memory. Everything the locus allocates (strings it builds, records it constructs, collection storage) comes from that arena. When the locus dissolves, the entire region is freed at once. There is no per-object deallocation, ever.
Regions nest exactly like loci do. A child’s region is a sub-region of its parent’s:
root
└── App's region
└── Server's region
├── Conn A's region
└── Conn B's region
When a locus dissolves, its whole subtree of regions frees wholesale. This is why shutdown cascades cleanly and why flow children reclaim per connection: freeing is structural, not traced.
Why no GC and no borrow checker
Both exist to answer one question — when is it safe to free this? Hale answers it structurally instead:
- No pointer crosses sideways. Vertical-only flow means a value in one locus’s region is never referenced by a sibling. So when a region frees, nothing dangles into it.
- Messages are copies, not pointers. A payload crossing a locus boundary is copied into the receiver’s arena. Sender and receiver have independent lifetimes; the sender can dissolve while the receiver still holds its copy.
With those two invariants, wholesale-free-at-dissolve is sound with no tracing and no aliasing analysis. The discipline the borrow checker enforces with annotations, Hale enforces with structure — you got it for free by building a locus tree.
Bounded storage: capacity slots
The arena is for transient, locus-lifetime allocation. When a locus needs bounded, disciplined storage — a recycling pool, a growable buffer — it declares capacity slots:
locus Router {
capacity {
heap routes of Route; // growable, individually freed
pool sessions of Session; // fixed-shape, recyclable cells
}
}
heap X of T— growable storage, cells allocated and freed individually during the locus’s life, the whole slot reclaimed at dissolve.pool Y of T— a bounded population of fixed-shape, recyclable cells (acquire / release).
The forms you’ve been using — @form(vec),
@form(hashmap) — are built on exactly these slots; the form
annotation just synthesizes the method surface over them. And for a
list that belongs inside a value rather than on a locus, there’s
bounded[T; N] (see Collections) —
fixed-capacity, laid out inline, whole-struct copies carry it, and
the memory-bound analysis treats it as bounded by construction. Slots
hold values, never locus references: locus membership goes
through accept, not storage.
Projection classes: committing to resolution
When a parent has many children, you can commit up front to the resolution at which it observes them — which lets the compiler pick the allocator that makes that resolution cheap:
locus WorkerPool : projection chunked {
accept(w: Worker) { }
}
rich— a handful of named children (≈4–10), each fully observed. Per-child arenas, low churn.chunked— moderate counts (≈10–30), observed in ranges. Per-child sub-regions with free-list reuse — the default when a locus accepts children.recognition— large populations (≈100–500), observed in aggregate (a count, a histogram). Pre-allocated fixed pools.
The projection class changes the allocator strategy, not your
code: the same parent and child methods read from a rich pool
or a recognition pool unchanged. It’s a commitment about
observation resolution; the compiler turns that into a layout.
Sizing is hints, lifetime is law
Declared sizes are hints — an arena that out-allocates its budget just adds another chunk; it doesn’t panic. The load-bearing property is lifetime: wholesale free at dissolve. That’s the contract every other guarantee leans on.
Next: keeping a long-running program’s memory flat — Performance.
Performance
Coming from Rust / C++? You’re used to controlling allocation and watching it. Hale’s arena model makes most code allocation-bounded by construction — a per-method scratch region absorbs intermediate allocations and frees them at method exit — but a few patterns can still grow a long-running process. This chapter is the shape of that growth and how to keep it flat.
The default is already bounded
Inside any locus method, a scratch sub-region opens on entry
and is destroyed on return. Transient allocations — string
concatenations, JSON parsing, format building — land in scratch
and are reclaimed when the method returns. Values you persist
(self.field = ...) are deep-copied into the locus’s own arena
first, so they outlive the scratch. The net effect: a hot
run() loop that allocates transiently doesn’t grow the locus’s
lifetime arena. You get this without doing anything.
So the question isn’t “how do I free?” — it’s “which patterns defeat the automatic bounding?”
The pattern that bites: accumulating in a loop
fn render(rows: Int) -> String {
let mut out = "";
let mut i = 0;
while i < rows {
out = out + render_row(i); // a fresh String each iteration
i = i + 1;
}
return out;
}
Each out + ... allocates a new string; scratch demand peaks at
the total size of every intermediate. For large inputs that
crosses a chunk boundary. The fix is an accumulator that grows
one buffer in place:
fn render(rows: Int) -> String {
let b = std::bytes::BytesBuilder { };
let mut i = 0;
while i < rows {
b.append(std::bytes::from_string(render_row(i)));
i = i + 1;
}
return std::str::from_bytes(b.finish());
}
BytesBuilder is the canonical accumulator — one extensible
buffer instead of N throwaway strings. Use it (or
std::json::Builder for JSON output) anywhere you build a result
incrementally.
Resolve string keys to ints at boot
If a hot path looks something up by string key in another locus,
the string gets copied on every call. Resolve the key to an Int
index once at startup and pass the index on the hot path:
locus Service {
params { metrics: MetricsRegistry = MetricsRegistry { }; ticks_idx: Int = 0; }
birth() {
self.ticks_idx = self.metrics.register("ticks_total"); // clone once
}
fn dispatch(m: Msg) {
self.metrics.inc(self.ticks_idx); // zero per-call alloc
}
}
Reclaim per-connection state
The other place growth hides is a daemon that
accepts a child per connection.
If those children are residents, their regions live until the
(never-dissolving) parent does, and memory climbs with connection
count. Make them flows — declare release(c: Conn) on the
parent — so each child’s region is reclaimed when its connection
ends. If RSS tracks connection count, this is almost always why.
Catching it at compile time
The growth patterns above — a per-message handler that allocates
into self, a connection child left resident — have a static
shape, and hale check flags them before you ever measure RSS.
These are advisory warnings, not build failures:
hale check app.hlflags (by default — no flag needed) an allocation that accumulates without bound: a struct / array / bytes value created in a per-message bus handler (or a runtime-bounded loop) that escapes intoself, where it lives until the locus dissolves — e.g. a whole-value replaceself.latest = Thing{…}, which bump-allocates a fresh value each message. The fix is usually in-place mutation (self.latest.field = v,self.arr[i] = v) instead of replacing the whole value, or the moves from this chapter — a capacity-bounded@form, route it over the bus, or a per-iteration child. Awhile i < N { … }counter with a constant bound is proven bounded and left alone. Run-to-exit programs (amainwith norunloop and no bus handler) are exempt automatically — a script that allocates and exits owes nothing. Opt out of a run with--no-warn-unbounded-alloc. Annotating a long-lived locus@boundedis now redundant with the default — the check already runs on everyhale check— but it’s still accepted. Use@unbounded(on afnor a lifecycle hook) to acknowledge an intentional accumulation and silence it.- The same check flags an insert into a growing collection —
v.push(x)/m.set(x)wherev/mis a@form(vec)or@form(hashmap)— when it runs in an unbounded context. The backing buffer grows with population and frees only at dissolve, so a push per message accumulates. A@form(ring_buffer)/@form(lru_cache)is cap-bounded and never flagged; switching to one (or bounding the loop) is the fix. (Detection reads the receiver’s declared type, so it seesfn f(v: IntVec)andself.buf: IntVecbut not an untypedlet.) hale check app.hl --warn-resource-leakis the same idea for file descriptors: anopen/connect/acceptwhose result is stored resident in an unbounded context, so fds pile up.
For the resource surface — thread / pool / subject / fd counts, not a leak — there’s a budget you can read or gate on:
hale check app.hl --dump-resource-budget
# OS threads (pinned loci): 1
# cooperative pools: 1 [io]
# bus subjects: 4
# fd acquisition sites: 2
Drop a ceiling file in CI and the build fails when a count climbs past it — “this PR added a pinned thread; bump the ceiling if you meant to.” Every key is optional:
# budget.toml
pinned_threads = 4
bus_subjects = 16
hale check app.hl --check-resource-budget budget.toml
None of these run by default — they’re tools you reach for when a program’s memory or fd surface is something you want to hold the line on.
Knobs for when it’s not your code
The substrate exposes diagnostics and glibc tuning via
environment variables — LOTUS_ARENA_RESIDENCY=1 to dump live
arena sizes from a heartbeat, LOTUS_ARENA_LOG_CHUNK_ATTACH=N to
trace which arena is growing, LOTUS_CHUNK_POOL_STATS=1 for
chunk-pool hit rates, and the MALLOC_* family for glibc’s
trim/arena behavior. The full table is in spec/memory.md and
the keeping memory bounded spec material. The workflow:
smaps-diff over a window → if it’s [heap], check 30s deltas →
bursty 64KB steps mean chunk-pool overflow (a loop accumulator)
→ fix with BytesBuilder.
Hot-path I/O primitives
For latency-sensitive sockets, the stdlib exposes the knobs you’d reach for in C, without an FFI shim:
- Disable Nagle —
std::io::tcp::set_nodelay(fd, true)(and thestd::io::tlssibling) so small writes hit the wire immediately instead of waiting ~40ms to coalesce. The first thing a request/response or market-data socket wants. - Wire-arrival timestamps —
recv_stamped_intoisrecv_intoplus a kernel RX timestamp captured in the samerecvmsg; read it withlast_recv_kernel_ns()right after. True wire time, not the post-scheduling receipt clock — for measuring real I/O latency. - Wrap-free parsing —
std::io::MirrorRingdouble-maps a buffer so any window is one contiguous slice even across the wrap point; a stream parser never special-cases the seam. Opt-in (it costs 2× address space) — for the ordinary case aBytesBuilderaccumulator is the right tool.
And the run-time complement to the compile-time
--warn-unbounded-alloc check: std::diag::heap_alloc_count() and
std::diag::syscall_count(name) let a test assert a steady-state
region did what you think — read the counter before and after and
check the delta is zero (“this loop allocated nothing”, “exactly one
recv per poll”).
Build-time tuning
hale build already tunes to the machine you build on: native
builds compile for the host CPU at O3, so generated code
autovectorizes to whatever the host supports (AVX2, AVX-512, …).
Two knobs matter when that default isn’t what you want:
-
--target-cpu baseline— pins a portablex86-64-v3target (AVX2 + BMI2 + FMA) instead of the host. Reach for this when you ship a binary to other machines: the default host-tuned build may use instructions an older CPU lacks.--target-cpu native(the default) is right forhale runand for binaries you execute on the build host (e.g. a service on hardware you control). -
LOTUS_LTO=1— an opt-in full-LTO build that inlines the lotus runtime (the arena allocator, string helpers, shm ring) into your code across the compile boundary it otherwise can’t cross. A few percent on allocation- and coordination-heavy programs — exactly the shape Hale is built for — and it keeps the host vectorization, so there’s no loop it slows down. It’s off by default because the link is ~3–4× slower and needslldon PATH; turn it on for release/perf builds, not the edit-compile loop:LOTUS_LTO=1 hale build myservice/
Where Hale earns its overhead
Hale is shaped to pay coordination cost well — bus dispatch,
region setup, lifecycle — and as of v0.9.0 that’s where it
leads. The lock-free bus plus static-dispatch devirtualization
turned coordination from a deficit into an advantage over Go:
bus_dispatch went from ~4× behind to 2.4× ahead, and
bus_dispatch_cross_pool from behind to 1.26× ahead. Reach
for Hale’s structure where the work is coordination-shaped, which
is most real systems.
The tight loop caught up too. Pure arithmetic used to be the
place the substrate showed through, but native codegen closed the
gap: fn_modular reached parity with clang -O3 C (~0.98 of
the C time). Coordination is the lead; tight-loop arithmetic is no
longer the price you pay for it.
Next: what @form actually compiles to — Forms under the
hood.
Forms under the hood
Coming from Rust / C++? A form is closer to a monomorphized template than to a generic collection object.
@form(vec)doesn’t wrap a one-size-fits-all container — the compiler emits a tight, type-specialized implementation per cell type, sized and laid out for your element. You declared the access discipline at the everyday level; here’s what it lowers to and how to choose.
A form is a lowering, not a type
When you write:
@form(vec)
locus Names {
capacity { heap items of String; }
}
the compiler doesn’t reach for a library Vec. It synthesizes,
for this locus and this cell type, a contiguous growable
buffer and the methods over it — push, get, set, pop,
len, is_empty, and the sort family. The storage is the
heap capacity slot; the form decides the layout
(here: a {cap, len, buf} struct with doubling realloc) and the
method surface.
The four forms and what they require:
| Form | Backing slot | Lowers to | Synthesized surface |
|---|---|---|---|
@form(vec) | one heap | doubling contiguous buffer | push, get, set, pop, len, is_empty, sort* |
@form(hashmap) | one pool + indexed_by | intrusive open-addressing table | set, get, has, remove, len, is_empty |
@form(ring_buffer, cap=N) | one pool | fixed circular buffer | push -> Bool, pop, len, is_full |
@form(lru_cache, cap=N) | one pool + indexed_by | fixed keyed table, LRU eviction | put, get, contains, len |
get / pop / remove are fallible (bounds / missing-key /
empty); push on vec is infallible, on ring_buffer returns
Bool (full is a normal condition, not an error). lru_cache is
the cap-bounded keyed form: put is infallible and silently
evicts the least-recently-used entry over cap (a get
counts as a use and saves an entry from eviction; contains does
not). Its get is fallible(KeyError) on a miss.
Both a vec and a hashmap also expose batched iteration
(shipped 2026-07-02) — for x in v.items { … } walks the vec, and
for e in m.entries { … } walks the map. The loop is an inline
buffer/slot walk, not per-element method calls. (Don’t mutate the
form inside the body — a grow would rehash under the cursor.)
By default a @form(hashmap) is single-pool: its densest layout
has no synchronization, and a cross-pool call into it is rejected.
Opt into concurrent access with the sync = … parameter —
@form(hashmap, sync = serialized) (per-map mutex),
sync = striped (concurrent readers), or sync = lockfree
(CAS-only steady state) — trading layout density for the sharing
discipline the workload needs.
The performance contract
Each form commits to a performance band, verified by microbenchmarks in the tree:
- Tight-loop primitive (
push) — within ~10% of idiomatic C.@form(vec).pushhits this. - Amortized workload — within ~2× of the C equivalent.
- Per-op fallible (
getthrough the fallible ABI) — no tight bound; advisory, because the fallible return shape and the function-call boundary cost real cycles.
The point: a form isn’t a slow generic that “works for any type.”
It’s a specialized implementation monomorphized to your cell
type. The cost is that a @form(vec) of Player isn’t
interchangeable with some library’s Vec<Player> — there’s no
such shared generic. If you want a shared API across forms, you
declare an interface.
Choosing a form
- Growable, ordered, index access →
@form(vec). - Keyed lookup, key is a field of the value →
@form(hashmap)(indexed_bynames the key field). - Bounded window, drop-or-backpressure on full →
@form(ring_buffer, cap = N). - Bounded keyed cache, evict least-recently-used on full →
@form(lru_cache, cap = N)(indexed_bynames the key field).
One form per locus — a locus is one container. Need two? That’s two loci, which is usually the cleaner decomposition anyway.
Orthogonal to projection class
A form governs how a locus stores cells of a value type. A projection class governs how a parent serves observations of its accepted child loci. They operate on different things and compose freely on the same locus:
@form(hashmap)
locus SessionStore : projection chunked {
capacity { pool sessions of Session indexed_by id; }
accept(w: Worker) { }
}
@form(hashmap) lays out the sessions value store;
projection chunked sizes the allocator for the accepted
Worker children. Different slots, no interference.
Cells are data
A form cell can be a primitive or a type record — never a
locus. Storing a locus in a map would mean get(key) hands a
live entity to a stranger, the same antipattern the language
rejects for methods returning
loci. For keyed entities, make
them accepted children and key a parallel index by name. Cells
are values; entities are children.
Next: the fastest same-machine transport — Zero-copy & the high-frequency bus.
Zero-copy & the high-frequency bus
Coming from Rust / C++? This is the shared-memory ring buffer you’d otherwise build by hand with
mmapand atomics. For same-machine routes north of ~100k msg/s — market data, tick streams — the per-message copy at the locus boundary shows up in the latency budget. Ashm_ringbinding writes the payload straight into a POSIX shared-memory slot the subscriber reads from. No kernel memcpy at the boundary. And it’s still the samesubscribe/publishcode.
The default copies; sometimes you can’t afford it
Every ordinary bus delivery copies the payload into the subscriber’s arena — that’s what keeps lifetimes independent and the memory model sound (see Memory & lifetime). For the vast majority of topics that copy is free in the noise. For the hottest same-host routes it isn’t, and you opt into a zero-copy path explicitly.
A shm_ring binding
In main’s bindings { } block:
main locus App {
bindings {
L2Updates: shm_ring("/l2-updates",
slot_count: 1024,
on_overflow: fail)
where intra_machine, zero_copy;
}
}
Publisher and subscriber mmap the same /dev/shm object and
coordinate through the ring’s slot indices. The publisher writes
its payload directly into a slot; the subscriber reads from the
same memory. No copy crosses the boundary.
The subscribe L2Updates as on_update; handler is the same line
of source it would be over a Unix socket — the substrate picks
the zero-copy lowering from the binding, not from the locus code.
Per-record vs. batch: the handler’s param picks the mode
By default the substrate calls your handler once per record:
fn on_update(u: Update) { // per-record
self.total = self.total + u.px;
}
On a high-rate cross-process feed that per-record call — plus the
per-call handler scratch — is exactly the overhead that loses to a
bare consumer loop in C or Go. Hale’s fix is the drain handler:
change the parameter type to Drain<T> and the substrate calls the
handler once per available batch, handing you a handle you
consume with a tight inline loop.
locus Agg {
params { total: Int = 0; }
bus { subscribe Quotes as on_quotes; } // SAME subscribe line
fn on_quotes(feed: Drain<Tick>) { // param type → batch mode
for t in feed { // zero-copy inline loop
self.total = self.total + t.px; // no per-record call
}
}
}
There is no new keyword — the subscribe clause is unchanged; the
parameter type alone selects the dispatch mode. Inside for t in feed, each t is read straight through the ring slot (so t.px
reads the mapped shared memory in place, never a copy), and the
consumer cursor advances once per batch instead of once per record.
Drain<T> is only spellable as a batch handler’s parameter and as
the thing you iterate; it is not a general value type. Batch
handlers on a foreign (layout:) ring aren’t supported yet — use a
per-record handler there.
The where clause is a checked contract
where intra_machine, zero_copy is two things at once: your
assertion about the route, and a contract the compiler validates.
- Scope —
intra_process,intra_machine, orcross_machine(pick one).zero_copywithcross_machineis rejected: the network always serializes. - Behavior —
zero_copyis rejected on transports that can’t honor it (unix(...)memcpies through the socket buffer; user adapters serialize throughsend(subject, bytes)).
Zero-copy needs a flat payload
A payload you can drop into a shared slot must be flat-shapeable:
every leaf is a fixed-layout primitive (Int, Float, Bool,
Decimal, Time, Duration), a fixed-size array of those, or a
struct whose fields are all flat-shapeable. String, Bytes,
and unbounded arrays carry heap pointers that don’t translate to
a shared slot, so the compiler rejects them on a zero-copy topic.
Use a fixed-size byte array ([Byte; 256]) for bounded text on
these routes.
Overflow is your decision
A shm_ring binding must declare on_overflow: — slot
exhaustion needs a policy the substrate can’t guess:
block— the publisher spins until a slot frees. Right for control-plane data that must not be lost.drop— overwrite the next slot; slow consumers miss messages. Right for stale-is-worthless feeds.fail— panic with a clear diagnostic. Process-level visibility into back-pressure.
Reading someone else’s ring
A shm_ring binding speaks Hale’s own ring format. But sometimes
the ring already exists — written by another program in another
language, with its own binary layout. Instead of hand-writing FFI
or forking the runtime, you declare that layout and point a
binding at it:
ring_layout ForeignRing {
magic 0x52494E47464D5431; // expected header magic at offset 0
version 1 at 8 : u32; // header field `version`, must equal 1
buffer_size at 12 : u32; // ring capacity, read from the header
data_at 128; // first record starts here
cursor published { // the producer's published byte cursor
at 64; repr atomic_u64; load acquire; unit bytes;
}
framing byte_records { // records are [u32 length][payload]
len_prefix u32; align 8; pad_sentinel 0xFFFFFFFF;
}
overflow lap_detect;
}
main locus App {
bindings {
Ticks: shm_ring("/foreign.ticks", on_overflow: drop,
layout: ForeignRing) where zero_copy;
}
}
A subscriber on Ticks now reads that foreign ring directly: the
runtime attaches it read-only, checks the magic and version, and
walks the length-prefixed records, handing each payload to your
on_tick handler with no copy. Your handler code is identical to
any other shm_ring subscriber — the layout only changes how the
substrate finds and frames the bytes.
A binding with no layout: keeps Hale’s native ring, so nothing
you wrote before changes.
The same binding works the other way too. If a locus in your
program publishes the topic, it becomes the ring’s producer: Hale
creates the segment, writes the header the layout describes, and
frames each Ticks <- Tick { ... } as a length-prefixed record
another program (or another language) can read. Give the binding a
buffer_size: to size the ring:
Ticks: shm_ring("/foreign.ticks", on_overflow: drop,
layout: ForeignRing, buffer_size: 65536) where zero_copy;
So the same declared layout lets Hale sit on either side of a foreign ring — consume what another process writes, or produce what another process reads — with the locus body unchanged. Two caveats at this version: a subscriber sees records published after it attaches (no replay of history), and if it falls more than a full buffer behind it resyncs rather than read a torn record.
Mixed record types: a raw BytesView payload
The examples above bind a fixed payload struct — every record on the
ring is the same shape. Real feeds are often heterogeneous: a header
plus one of several record types, selected by a discriminator, with
varying length. Bind such a topic to a BytesView payload and the
subscriber receives a bounded view over each record to decode itself:
topic Recs { payload: BytesView; }
locus Reader {
bus { subscribe Recs as on_rec; }
fn on_rec(v: BytesView) {
let kind = std::bytes::read_u8(v, 0) or 0;
match kind {
1 => { /* decode an L1 record with std::bytes::read_* */ }
2 => { /* decode an L2 record */ }
_ => { }
}
}
}
No fixed size is assumed (a differently-sized valid record isn’t
dropped), and you decode with the std::bytes::read_* pack readers and
a discriminator branch. This is the path for reading real external
mixed-record rings; the typed-struct binding stays the fast path for a
homogeneous ring.
Producing such a ring is symmetric — build a record with a
BytesBuilder and send the bytes:
fn emit_l2(level: L2) {
let b = std::bytes::BytesBuilder { initial_cap: 64 };
b.append_u8(2); // discriminator
b.append_u32_le(level.price);
b.append_u32_le(level.qty);
Recs <- b.view(); // framed at its own length
}
Recs <- bytes frames [len_prefix len][bytes] where len is the
value’s actual byte length, so each record carries its own size.
Writing in place (zero-copy)
That builds the record in a temporary buffer, then copies it into the ring. To skip the copy on a hot producer path, write the fields directly into the reserved slot:
fn emit_l2(level: L2) {
Recs.write(24) { w => // reserve up to 24 bytes
std::bytes::write_u8(w, 0, 2) or raise;
std::bytes::write_u32_le(w, 1, level.price) or raise;
std::bytes::write_u32_le(w, 5, level.qty) or raise;
9 // bytes written -> the record length
};
}
Topic.write(max) { w => ... } reserves up to max bytes, hands the
body a writable view w over the slot, and commits the byte count the
body’s tail yields. The std::bytes::write_* family mirrors the readers
(bounds-checked, fallible(IndexError)). The reserve and commit are
scoped to the block, so the view can’t escape and the commit can’t be
forgotten.
Naming the fields (repr: tags)
Hand-writing read_u32_le(b, 12) per field is error-prone — the offsets
are implicit and drift as the record changes. Tag a struct’s fields with
their wire representation and the offsets are computed for you, with typed
accessors generated from the layout:
type L2 {
kind: Int `repr:"u8"`; // 1 byte @ 0
price: Int `repr:"u32_le"`; // 4 bytes @ 1
qty: Int `repr:"u32_le"`; // 4 bytes @ 5
}
Now the consumer reads fields by name and the producer writes them by name — both compose with everything above:
fn on_rec(v: BytesView) {
let p = L2::price(v) or raise; // read u32_le @ 1
...
}
fn emit(level: L2) {
Recs.write(9) { w =>
L2::set_kind(w, 2) or raise;
L2::set_price(w, level.price) or raise;
L2::set_qty(w, level.qty) or raise;
9
};
}
Type::field(v) and Type::set_field(w, x) desugar to the matching
std::bytes::read_* / write_* call at the field’s computed offset — so
they’re exactly as cheap (and as bounds-checked) as writing the primitive
by hand. Offsets run in declaration order over the tagged fields; pin one
for a padded foreign format with repr:"u32_le,at=4". The tag itself is
general key:"value" metadata — repr: is the binary-pack consumer;
other keys (e.g. json:) are free for later tools.
Per-record headers and wire timestamps
Real external feeds often prefix each record with a small fixed
header — a sequence number, a producer-side wire-arrival timestamp —
before the variable payload. Declare it in the ring_layout with
record_header_bytes (and pad_field for any alignment padding),
and the subscriber reads those header fields for the record it’s
currently handling through std::shm:
fn on_rec(v: BytesView) {
let seq = std::shm::last_record_seq(); // header sequence no.
let wire_ns = std::shm::last_record_kernel_ns(); // producer wire time
// ... decode v as before ...
}
These read like the errno-style timestamp getters on a socket recv:
call them inside the handler, and they describe the record being
delivered. Each returns 0 when the layout declares no such field.
The layout’s recheck post_copy guard re-validates the header after
the copy, so a record torn by a producer lapping the ring is never
surfaced with a half-written header. (A native fixed-stride ring uses
framing slots instead of length-prefixed byte_records — same
layout: machinery, a different framing kind.)
The same shape, one tier down
Notice this is the same move as everything else at this level: an operational requirement (zero-copy delivery) declared at the deployment seam, validated by the compiler, consumed by codegen to pick a lowering — while the locus body stays the synchronous, portable code you wrote three tiers ago. You reach under the hood without rewriting the program.
Next: calling into native libraries — Binding C.
Binding C
Coming from Rust / C++? This is
extern "C"with a thin, hand-written wrapper — nobindgen, no build-script codegen. You declare the C symbols you need with@ffi("c"), ship a small glue.cfile, and name the link flags inhale.toml. The compiler emits LLVMdeclares and the linker resolves them. No compiler change is needed to bind a new library.
Declaring a foreign function
An @ffi("c") annotation on a bodiless top-level function
declares an external C symbol:
@ffi("c") fn doubler_double(x: Int) -> Int;
fn main() {
println(doubler_double(21)); // 42
}
The LLVM symbol name is the function name verbatim — no mangling
— so the linker matches it directly against your C. Convention:
prefix FFI names with the library identifier
(raylib_init_window, sqlite3_open) to keep the global C
namespace tidy.
Type marshalling
Only a portable subset crosses the boundary; the mapping is fixed:
| Hale | C |
|---|---|
Int | int64_t |
Float | double |
Bool | int32_t (0 / 1) |
Duration / Time | int64_t (nanoseconds) |
String | const char * (NUL-terminated) |
Bytes | pointer to [int64 len][payload] |
user type | pointer to a layout-matching struct |
() | void (return only) |
Decimal and fixed-size arrays are not portable across FFI —
the compiler rejects them at the boundary. Function declarations
also can’t be generic or fallible(E); a C function reports
errors with a sentinel, and your Hale wrapper translates that
sentinel into the fallible channel.
The glue and the build
Write the C side as an ordinary translation unit:
/* glue.c */
#include <stdint.h>
int64_t doubler_double(int64_t x) { return x * 2; }
Build, naming the C source (and any libraries to link):
hale build mydir/ --csrc glue.c
hale build mydir/ --csrc raylib_glue.c --link raylib
For a reusable binding library, declare the surface in
hale.toml so consumers don’t pass flags by hand:
[ffi]
csrc = ["glue.c"]
link = ["raylib"]
A downstream project then just imports the binding and builds
normally; the FFI flags thread through automatically.
Lifetime rules across the boundary
The boundary is read-only for arena-owned memory, and the rule is
simple: the caller owns every pointer; the callee must not
retain it past the call. If the C side needs to keep data, it
mallocs and copies. If it returns heap data back to Hale, it
allocates into the caller’s arena via
lotus_arena_alloc(lotus_caller_arena_or_global(), size, align)
so the value lives by Hale’s rules. Exceptions / longjmp must
not cross the boundary.
This is the whole FFI story — declare, glue, link. The full
contract (struct-return sret convention, the exact view layout
for BytesView) is in spec/ffi.md. Binding libraries
conventionally live in pond; the
agents/binding-packages.md brief covers the recommended file
layout.
On the wasm target,
@ffi("c")has a sibling:@ffi("js")declares a function the JavaScript loader provides instead of a linked C symbol, and@exportsends Hale functions out to the host. Same declare-and-bind shape, different boundary — see WebAssembly & the browser.
Next: state that outlives one process — Cross-process & hot-load.
WebAssembly & the browser
Coming from the web stack? Hale compiles to a self-contained
.wasmplus a small.mjsloader — no Emscripten, no bundler. The same locus/bus/std::*program you run natively can run in the browser; you choose the target at build time. The browser APIs you can’t reimplement (fetch, WebSocket, WebGL, the DOM) come in as thin host functions, and Hale functions you want the page to call go out as exports.
Building for wasm
hale build client/main.hl --target wasm32
This emits client/main.wasm (self-contained — a tiny bundled
libc, no external runtime) and client/main.mjs (a loader that
instantiates the module and wires the host functions). The program
declares the target so the typechecker can gate the parts of the
standard library that need syscalls:
target wasm { }
Under target wasm, the portable stdlib works as usual
(std::str, std::bytes, std::json, std::math, …), but the
POSIX-backed namespaces (std::io::tcp, std::process,
std::http, …) are rejected at typecheck — the browser sandbox has
no syscalls. Reach the outside world through host functions instead.
The in-process typed bus — topic / bus { publish … } /
bus { subscribe … } across loci — runs under wasm exactly as it
does natively: a Subject <- payload is delivered to every matching
subscriber’s handler in the same module, payload-copied through the
synthesized wire codec. Only the cross-process / network transports
(shm_ring, unix, CONNECT-role bindings) are unavailable in the
sandbox — those need syscalls. So the idiomatic locus + topic + bus
shape is fully available client-side.
The @form collections — @form(vec), @form(hashmap), and
@form(ring_buffer) — run under wasm too; their runtime primitives use
the target-pointer-width size_t ABI, so a push / set / get / len
behaves identically to native.
Calling the host: @ffi("js")
@ffi("js") is the wasm sibling of @ffi("c"):
it declares a function the JavaScript loader provides.
target wasm { }
@ffi("js") fn console_log(msg: String);
@ffi("js") fn draw_line(x1: Float, y1: Float, z1: Float,
x2: Float, y2: Float, z2: Float);
Marshalling: Float and Int both arrive as a plain JS number —
an @ffi("js") Int crosses as f64, not a BigInt, so your host
handler gets a number with no Number(x) step, and an Int-returning
import takes a plain number back. (The one caveat is f64’s range:
Ints beyond 2^53 lose precision across this boundary — send those as
a String/Bytes payload. And this applies to @ffi("js") only;
@ffi("c") keeps i64.) String/Bytes arrive as a pointer the loader
reads out of wasm memory. The loader ships a built-in console_log and
the libm set (so std::math just works); your page supplies the rest
through run(glue):
import { run } from "./main.mjs";
const inst = await run((h) => ({
draw_line: (x1,y1,z1,x2,y2,z2) => { /* push to a WebGL buffer */ },
}));
Letting the host call you: @export + the app locus
To run a game loop or react to network messages, the host needs
to call into Hale. The browser-client shape is an @export locus — the persistent “app” of your program:
@export locus Client {
params { sx: Float = 0.0; sy: Float = 0.0; ready: Bool = false; }
birth() { }
fn on_message() { /* parse an inbound frame, update fields */ }
fn frame() { /* render from the fields */ }
}
Each fn method becomes a wasm export the page calls by name
(inst.exports.frame()). State lives in the locus’s fields and
persists across calls — on_message() writes self.sx,
frame() reads it, just like a native locus. On the native target
@export is a no-op. (There is also a lower-level @export fn for
free functions — same export, but stateless; see below.) Methods
may not be fallible (the host has no error channel), and the locus
must not define run() — the host drives it.
The run-model: entry inversion
A native program blocks in main. A browser program can’t — it
must return to the event loop so the page stays responsive. So a
program built with @export runs inverted: there is no main,
and the host drives the exports (typically frame() once per
requestAnimationFrame).
The compiler synthesizes an exported _hale_start() that sets
up a persistent program arena and instantiates your @export locus (running birth). The loader calls it once at startup;
after that the page drives the methods:
const inst = await run(glue); // _hale_start ran here (Client is alive)
function tick() {
inst.exports.frame();
requestAnimationFrame(tick);
}
requestAnimationFrame(tick);
A program made of @export declarations needs no fn main at all.
Quick wasm from a bare fn main: --wrap-main
A wasm program needs an @export entry — but a script, a tutorial
snippet, or anything pasted into the browser playground is just a
fn main. The --wrap-main build flag bridges that gap:
hale build snippet.hl --target wasm32 --wrap-main
When the program has a top-level fn main() and no @export entry,
--wrap-main synthesizes — on the parsed AST, before typecheck — the
equivalent of:
target wasm { }
@export locus __Main { birth() { <main's body> } }
so main’s body runs once at _hale_start, exactly as it would run
once natively. Because it works on the AST, not the source text:
- diagnostics keep the user’s line/col — the synthesized locus
borrows
main’s spans and the body is moved intact, so a type error on the user’s line 3 is reported on line 3 (a textual wrap would shift every following line); - it’s string/comment-safe — the real lexer found the body, so a
{or}inside a string literal or comment can’t mis-wrap it; - the
target wasmgate is injected too, so the syscall-backed stdlib (std::io::tcp,std::process, …) is rejected with a precise diagnostic, on untouched source.
It is wasm-only and opt-in: it requires --target wasm32 (there is
no native entry-inversion to wrap, so it errors on a native build), and
it’s never implied — a normal wasm program may legitimately keep a bare
fn main exported as main. If the program already declares an
@export entry, --wrap-main leaves it untouched (prefer-explicit).
This is the one flag the browser playground passes so it can hand the
compiler raw user source and surface errors on the exact line.
Inbound messages
The page hands network bytes to Hale through the inbox: write them into wasm memory, publish the length, then call a method.
// JS: hand a WebSocket frame to Hale, then notify it
const bytes = new TextEncoder().encode(ev.data);
const ptr = inst.exports.lotus_wasm_alloc(bytes.length);
new Uint8Array(inst.exports.memory.buffer).set(bytes, ptr);
inst.exports.lotus_wasm_set_inbox(bytes.length);
inst.exports.on_message();
@ffi("c") fn lotus_wasm_inbox() -> Bytes; // the bytes JS wrote
// inside the Client locus:
fn on_message() {
let msg = lotus_wasm_inbox();
if len(msg) > 0 {
let s = std::str::from_bytes(msg);
// ... std::json parse, then store into self.* ...
self.ready = true;
}
}
This is the full pattern for a browser client: the page owns the
transport (fetch / WebSocket) and the GL context; the @export locus parses the protocol with std::json, holds the game state
in its fields, runs the camera, and emits geometry — the same code
shape it would have natively.
Lower-level: @export fn + the state cell
If you don’t want a locus, you can export free functions
(@export fn frame()). These are stateless — each call’s
allocations are released on return — so cross-call state must be
parked in the runtime state cell, packed into Bytes:
@ffi("c") fn lotus_wasm_state_set(b: Bytes);
@ffi("c") fn lotus_wasm_state_get() -> Bytes;
The @export locus model is preferred for anything with state; the
state cell exists for the free-fn path and for hand-rolled layouts.
See spec/ffi.md § WASM host interface for the
exact marshalling and diagnostic rules.
Cross-process & hot-load
Coming from Rust / C++? This is typed, versioned state shipped between processes — but without a separate
.protoand a codegen step. Aperspectiveis a serializable parameter bundle; producer and consumer share its schema because they compile from the same source. No protobuf regen, no schema drift, no handshake.
A perspective is a shippable view
Most of a locus’s state is private to its region. A
perspective is the exception: a typed bundle a locus can
publish across a process boundary, with a compile-time guarantee
that the other side agrees on its shape.
perspective Kernel {
params {
scale_row: [Decimal; 8];
sigma_factor: Decimal;
regime_id: Int;
}
stable_when {
return self.num_validated >= 3;
}
serialize_as KernelV1;
}
paramsis the payload — the schema is this type.stable_whenis a predicate the runtime checks before the perspective is allowed to ship — “is this data ready?” lives in the data’s own declaration, not in a publisher flag.serialize_asnames the wire format stably, so you can rename the identifier without breaking serialization.
A perspective is not a locus — no lifecycle, no bus block, no
methods beyond stable_when. It’s a validated, serializable
bundle the substrate knows how to ship.
The fitter / applier pattern
The canonical use: one process computes parameters slowly and
carefully; another applies them at high frequency. Both compile
from the same Kernel declaration, so the type is the protocol.
// fitter — publishes refined Kernels
locus Fitter {
bus { publish KernelUpdates; }
run() {
let mut k = compute_kernel(observations);
while !k.is_stable() { k = refine_kernel(k, more()); }
KernelUpdates <- k;
}
}
// applier — swaps in the latest, atomically
locus Applier {
params { current: Kernel = default_kernel(); }
bus { subscribe KernelUpdates as on_update; }
fn on_update(k: Kernel) { self.current = k; } // atomic swap; no torn read
}
The runtime guarantees the consumer-side swap is atomic — readers see the old perspective or the new one, never a half-written mix. This is also the hot-load mechanism: reconfigure a long-running service by publishing a new perspective, with full type-checking against the locally-compiled schema, no restart.
Capability profiles and substrates
The same locus + bus + perspective triple runs on more than one substrate. The native C-runtime is one; the browser runtime (hale-js) is another. A build target declares the capabilities a substrate offers:
target browser_js {
arenas.epoch_view,
time.monotonic, time.wallclock,
random.csprng,
gfx.canvas2d,
}
A program that reaches for a capability its target doesn’t offer
fails at the translation boundary with a clear CAP-MISSING
diagnostic — at build, not at runtime. Substrate differences are
named and checked, not papered over. The locus you wrote doesn’t
change between substrates; the capability profile and the
transport binding do.
This is the long arc of the whole guide paying off: the same shape you met as a small program in the basics runs across processes, machines, and substrates because nothing in the shape depended on where it ran.
Next, the most specialized tier feature — Modes.
Operations & debugging
Most of the time an Hale program either works or fails loudly. The two exceptions — the ones that send you here — are a message that quietly doesn’t arrive and resident memory that quietly grows. Both are silent by design (the steady-state behavior is correct), so the runtime ships opt-in diagnostics you switch on with an environment variable or a build flag. This chapter is the operator’s map: what each knob shows, and two worked triage walkthroughs.
Nothing here changes behavior — every switch is observe-only. The
canonical reference for each variable is spec/runtime.md; this is
the pedagogical version.
Bus: “my publish isn’t arriving”
A publish that compiles is not a publish that’s delivered — the
subject might match no subscriber, the payload might fail to
deserialize, or the subscriber might be on a pool that never runs.
The bus drops these silently because for an on_unmatched: swallow
topic in steady state that is the right behavior. To see the
drops, set one variable:
LOTUS_BUS_LOG_DROP=1 ./myapp
LOTUS_BUS_LOG_DROP is the broad net — reach for it first. It
prints one stderr line at every silent-drop site, naming the call
site, subject, and size/index info: no-matching-subscriber,
serialize-returned-≤0, deserialize-returned-≤0, and the
“matched-but-no-post-target” case (mailbox / pool / queue all null).
It implies the two narrower variables, which you can use on their
own once you know which class you’re chasing:
| Variable | Surfaces |
|---|---|
LOTUS_BUS_LOG_DROP=1 | everything below, plus serialize-fail and no-post-target |
LOTUS_BUS_LOG_UNMATCHED=1 | a keyed publish (where key == …) that matched no subscriber — prints subject, key, and the per-topic subscriber counts |
LOTUS_BUS_LOG_DESERIALIZE_DROP=1 | the udp:// reader thread dropping a frame (no deserializer registered, or a size-mismatched read) |
The shape that produces no line at all. If LOTUS_BUS_LOG_DROP
is silent but the handler still never fires, the message was
delivered to the queue and the problem is downstream: the
subscriber’s pool isn’t draining. The classic cause is a run() on
a cooperative pool that blocks (a long time::sleep, a blocking
syscall) and starves the handler — hale check warns on blocking
syscalls in a cooperative run(), and std::process::dump_pool_residency()
shows pending counts per pool so you can see work piling up unserved.
Memory: “my RSS is growing”
Hale frees a locus’s whole region on dissolve, so a leak is usually one of two things: an allocation that escapes to a long-lived arena (it never dissolves), or a queue/buffer whose high-water mark keeps climbing. Two layers of instrumentation pin it down — one at runtime, one at compile time.
Runtime residency. Set LOTUS_ARENA_RESIDENCY=1 to register
every top-level arena (each locus’s region, the global, the bus
payload arena) with a construction backtrace. Then call
std::process::dump_arena_residency() to emit one line per live
arena — bytes, chunks, parent, label — sorted by bytes descending,
each with the backtrace of where it was created:
// In a long-running daemon, sample from a heartbeat tick so locus
// arenas are caught *while alive* — the atexit dump fires only
// after every locus has torn down.
fn on_tick() {
std::process::dump_arena_residency(); // → stderr, needs LOTUS_ARENA_RESIDENCY=1
println("rss=", std::process::rss_bytes() / 1048576, " MB");
}
std::process::rss_bytes() is the cheap top-line number — poll it
to confirm growth before you go digging. dump_pool_residency() is
the per-pool view (pending/in-flight work), useful when the growth
is a queue rather than an arena.
Compile-time proofs. Before the program even runs, three build flags report on allocation shape:
| Flag | Reports |
|---|---|
| (default on every check/build) | flag an allocation that escapes into an unbounded context and accumulates until its locus dissolves (advisory warnings; --no-warn-unbounded-alloc opts out) |
--dump-alloc-summary | every allocation site, escape-tagged (local / returned / stored-to-self / sent), with the bounded-vs-unbounded verdict; plus each locus’s storage shape (capacity slots, @form, projection cap) and the self.<field> / self.<slot> an allocation targets |
--dump-resource-budget | per-locus resource counts (allocations, held fds) against declared ceilings |
--locality-report | per-locus working-set size against cache-tier budgets |
The memory-bound warnings run by default on every hale check
and hale build (since 2026-07-02 — the flip followed a full-corpus
audit of all 402 warnings). Run-to-exit programs are exempt
automatically: a binary whose main starts no run loop and
subscribes no handler owes no memory-bound proof, so scripts and
one-shot tools stay silent.
For a long-lived service, the surface is:
-
@unbounded fn— the greppable in-source carve-out for an acknowledged accumulation (an operator-sized cache, an idempotency log). Silences that body’s sites. Also valid on a lifecycle hook (@unbounded run { … }).locus Aggregator { // ... handlers checked for unbounded accumulation ... @unbounded fn on_snapshot(s: Snapshot) { // acknowledged: this cache is operator-sized on purpose. } } -
--no-warn-unbounded-alloc— opts a whole run out. -
@bounded locus L { … }is now redundant with the default and still accepted.
The warnings are advisory — they print but don’t fail the build. A warning here is the compile-time complement to the residency dump: it tells you which site can grow before you’ve watched it grow.
Bus backpressure: bounding a flood
A producer that outruns its consumer used to grow the dispatch queue
without limit. It no longer does — the queue and each pinned-locus
mailbox are capped at LOTUS_BUS_QUEUE_CAP cells (default 8192 ≈
4.5 MB):
LOTUS_BUS_QUEUE_CAP=1024 ./myapp # tighter bound, more frequent drains
Past the cap the producer back-pressures rather than buffering: a single-threaded cooperative producer inline-drains the queue (runs the oldest handlers) to make space; a cross-thread producer to a pinned mailbox blocks on a condvar until the consumer drains a slot. Every message is still delivered — only the timing and memory profile change. Lower the cap to tighten the memory bound; raise it to reduce drain bursts. (See GH #125 for the full mechanism.)
Shelling out to other programs
Ops glue often means running another tool. std::process::run
does a synchronous fork + exec + wait and captures the result. The
argument vector is newline-separated (no shell, no word
splitting — each line is one argv entry):
let out = std::process::run("git\nstatus\n--short") or raise;
println("exit ", to_string(out.code));
println(out.stdout);
if len(out.stderr) > 0 { println("stderr: ", out.stderr); }
The returned ProcessOutput carries code: Int (the exit code,
or -1 if killed by a signal), signal: Int (the killing signal,
0 if it exited normally), and stdout / stderr as captured
Strings. run is fallible(IoError) — a missing binary or a
fork failure raises rather than returning a bogus output.
For a long-running child you drive incrementally, the lower-level
spawn / wait / kill / write_stdin / read_stdout /
read_stderr surface over a Child handle is in
spec/stdlib.md.
Other process self-introspection: std::process::pid(),
std::process::exit(code), and std::process::rss_bytes() (peak
RSS — see Memory above).
Worked triage
“My subscriber’s handler never runs.”
LOTUS_BUS_LOG_DROP=1 ./app. A line at the publish? → the subject or key doesn’t match, or the payload won’t deserialize. Fix the subject/key or the payload type.- No line, but still no delivery? → the message reached the queue;
the consumer isn’t draining. Check the subscriber’s pool: a
cooperative
run()that blocks starves handlers.hale checkflags blocking syscalls;dump_pool_residency()shows the pending pileup. - Subscriber is an inline child or on
where async_io? → confirm it’s instantiated as an owned param or top-level, not unowned in a method body (which dissolves at scope exit before it can fire —hale checkerrors on this).
“My RSS climbs over hours.”
rss_bytes()from a heartbeat — confirm it’s monotonic, not sawtooth (sawtooth is healthy churn).LOTUS_ARENA_RESIDENCY=1+dump_arena_residency()from the same heartbeat — find the arena whosebytesgrows. Thelabeland backtrace name the locus and birth site.- A
root-kind arena growing is the leak; asubarena recycles. If it’s the bus payload arena, the high-water is queue depth — lowerLOTUS_BUS_QUEUE_CAP. If it’s a locus arena, you’re accumulating into a field: prefer in-place mutation (self.f.x = v) over whole-value replace (self.f = T{…}), which bump-allocates fresh each time.--dump-alloc-summarynames the site at compile time.
Debugging with the native toolchain
Hale binaries carry DWARF line tables by default (zero runtime cost). That means real debugging:
hale build myservice
gdb ./myservice
(gdb) break myservice.hl:42
(gdb) run
(gdb) backtrace # real .hl file:line frames, inline stacks
addr2line -e ./myservice 0x4a2f10 resolves crash-dump addresses
to source lines, and ASAN reports carry file:line through both the
Hale code and the runtime. Profile with
perf record --call-graph dwarf (frame pointers are deliberately
not forced — they cost ~22% on runtime fast paths). Opt out of
debug info with LOTUS_NO_DEBUGINFO=1.
Modes
Coming from Rust / C++? Think of modes as asking the compiler to emit a different execution strategy for the same computation over the same state — vectorized throughput, cache-tiled per-class work, or a single scalar decision — without you maintaining three copies. It’s the most specialized feature in the language; most loci never declare one.
Three named projections of one kernel
A locus can declare up to three modes, each a named projection of the same underlying computation, operating on the same locus state:
locus Pricer {
params { /* shared state */ }
mode bulk(...) -> ... { /* vectorized over many inputs */ }
mode harmonic(...) -> ... { /* per-class / cache-tiled */ }
mode resolution(...) -> ... { /* one decision, scalar */ }
}
You invoke a mode like a method — self.bulk(...),
self.resolution(...) — and declare only the subset you actually
operate in. They map to genuinely different hardware execution
regimes:
bulk— vectorized throughput: the same operation across many elements at once.harmonic— cache-tiled, per-class projection: work organized so each class’s data stays resident.resolution— a single scalar decision: the one-input-one-answer path.
The compiler emits a strategy tuned to each regime, rather than running one general implementation everywhere.
They share the arena
All three modes read and write the same locus state through the
same arena — there’s no duplicate allocation and
no copy between them. Because they can touch the same fields, the
compiler verifies the modes don’t write-conflict: a
resolution-mode write to state that bulk mode also writes
during overlapping evaluation is a compile-time error. You get
three execution strategies over one piece of state, with the
aliasing hazard checked for you.
Why three, and no fourth
The count isn’t arbitrary minimalism — it’s that vectorized, cache-tiled, and scalar are three distinct cost regimes on real hardware (high-throughput SIMD, locality-bound per-class, and latency-bound single-decision). There’s no fourth regime the hardware rewards, so there’s no fourth mode. The same commit-hard discipline as the three projection classes for memory.
When you’ll reach for this
Rarely, and only at this tier — when a locus has a kernel
computation that genuinely runs in more than one of those
regimes (a numeric model evaluated both in batch and
per-decision, say) and you want each path lowered well from one
declaration. For ordinary application and service code, you’ll
never declare a mode; the lifecycle methods and fn members
cover everything.
That’s the systems tier — and the bottom of the descent. You started with variables and functions; you’ve now seen the memory model, the allocation disciplines, zero-copy transport, the C boundary, cross-process state, and hardware execution regimes. Every one of them is the same locus you met in the basics, observed at greater and greater resolution.
To see why one shape holds across all four tiers — and across human, LLM, and machine — read The design. For exact rules, the reference points into the canonical spec.
Reference
This guide is the tour. The canonical contract — what the
compiler actually enforces — lives in the spec/ directory at
the repository root. When the guide and the spec disagree, the
spec wins; when you need the exact rule, an edge case, or a
diagnostic’s meaning, go there.
The spec, by topic
| You want | Read |
|---|---|
| The formal grammar | spec/grammar.ebnf |
| Lexical structure, literals, operators | spec/tokens.md |
| Operator precedence & associativity | spec/precedence.md |
| Operational semantics (lifecycle, bus, recovery, fallible) | spec/semantics.md |
| The type system | spec/types.md |
| Memory: regions, capacity slots, projection classes | spec/memory.md |
The form library (vec / hashmap / ring_buffer) | spec/forms.md |
| The always-loaded runtime | spec/runtime.md |
| The standard library surface | spec/stdlib.md |
| Idiomatic patterns & the six shapes | spec/styleguide.md |
The FFI contract — C (@ffi("c")) and the WASM host interface (@ffi("js") / @export) | spec/ffi.md |
| Dependencies & vendoring | spec/packages.md |
| Project layout & imports | spec/projects.md |
| How tests are written and run | spec/testing.md |
| Why every design choice was made | spec/design-rationale.md |
Two more anchors
AGENTS.md— the load-bearing prompt for agents writing.hl. It condenses the six idiomatic patterns, the “what’s not in the language” reflexes, and the formal design model into one file. Excellent for a human, too.- Working programs —
crates/hale-codegen/tests/fixtures/examples/holds ~70 small per-feature programs, numbered. Reading a few near your target shape is the fastest way to see real, compiling Hale.
Toolchain commands
| Command | Does |
|---|---|
hale run <file/dir> | compile + run (fast feedback) |
hale build <file/dir> | compile to a native binary |
hale check | parse + typecheck only |
hale test | run *_test.hl |
hale fetch | clone & pin git dependencies |
hale fmt | canonical formatter |
Libraries (pond)
The standard library covers the substrate — I/O, time, strings, JSON, HTTP, crypto, the bus. Everything else — web stacks, databases, observability — lives in pond, the contributed library catalog: https://github.com/hale-lang/pond.
Many lotus grow in a pond. Each library is a directory of .hl
loci you vendor into your project.
Using one
Declare it in hale.toml, fetch it, import it:
[deps]
pond = { git = "https://github.com/hale-lang/pond", tag = "v0.1.0" }
hale fetch
import "vendor/pond/router" as router;
hale fetch clones each dependency into vendor/<name>/ and
pins the resolved commit in hale.lock. Pond’s “no transitive
dependencies in v1” rule means every package your program pulls
in is visible in your lockfile — if a library uses another, you
vendor both explicitly.
The catalog
Persistence & data
| Library | Provides |
|---|---|
db | Driver-agnostic database surface: the DbDriver interface + Args bind-parameter list for parameterized ($1, $2, …) queries. Pick a backend (pq, sqlite) at the DbDriver slot. |
pq | PostgreSQL driver — PgConn plus PgPool, a fixed-size fd connection pool that itself satisfies db::DbDriver. |
sqlite | SQLite connection + fallible query surface. |
migrations | Schema migration runner (up/down); builds to a migrate binary. |
jobs | SQLite-backed job queue (Queue) + a pinned-worker pool. |
Web
| Library | Provides |
|---|---|
http | HTTP client (http/client) over std::io — request/response building atop the socket primitives, for libraries that need an HTTP client without the full std::http server surface. |
router | HTTP router over std::http — method + path-param routes, middleware chain. |
sessions | Stateless, HMAC-signed cookie sessions (session=<base64(payload)>.<base64(hmac)>). |
websocket | Synchronous, owner-driven RFC 6455 WebSocket client (suggested alias ws); a passive wrapper your own run() loop drives. |
Observability & supervision
| Library | Provides |
|---|---|
logfmt | Alternative std::log sinks wearing the std::text::Sink shape — file with rotation, structured output. |
metrics | Counter / gauge / histogram primitives + a Prometheus text-format renderer and /metrics endpoint. |
tracing | Span tree mirroring the locus tower — one Tracer per app; spans nest with locus instantiation. |
supervisor | Erlang/OTP supervision-tree strategies grafted onto Hale’s on_failure + restart / restart_in_place / bubble. |
Primitives & composition
| Library | Provides |
|---|---|
crypto | SHA-256, HMAC-SHA256, hex encode/decode, constant-time compare, CSPRNG. |
subprocess | Spawn + manage child processes (suggested alias sub) — wraps the std::process spawn / wait / pipe primitives. |
tower | Run several independent locus trees (“towers”) under one process, each with its own root and lifecycle. |
Terminal & UI
| Library | Provides |
|---|---|
term | Tier-0 terminal infrastructure — capability/is_tty probes, SGR styling, raw-mode guard, cursor + screen control over std::term. |
tui | An Elm-shaped TUI runtime: write a locus with model/update/view, the runtime drives the frame loop, input, and rendering. |
AI & numeric
| Library | Provides |
|---|---|
agent | LLM-agent toolkit — agent/{llm, tools, conversation, embeddings, sandbox}: a client surface, a tool-registry, conversation state, and a sandboxed execution path. |
ml | Neural-network primitives (ml/neural). |
math | Numeric helpers — math/{matrix, stats}. |
heron(the tree-sitter grammar that drives editor tooling) also lives in pond, but it’s developer tooling, not a vendored runtime library youimport. The_utildirectory holds internal helper libs consumed by other pond libs, not imported directly by apps.
Pond is where the ecosystem grows: if a protocol, parser, or shape is too useful to rewrite per project but doesn’t belong in the language, it lands here.
Verification
Most languages ask you to write correct concurrent code and hope you did. Hale takes a different bet: make incorrect designs fail to compile, and model-check the runtime everything executes on. This page is the honest account of what that buys you — and what it deliberately doesn’t.
The substrate is model-checked
Hale’s runtime, lotus, is C: pthreads and C11 atomics. Every primitive in it with a cross-thread surface is transcribed into a model and checked exhaustively, under every legal interleaving, with GenMC — as a standing CI gate. A race, use-after-free, or assertion failure in any model fails the build.
| Primitive | What’s verified |
|---|---|
| Lock-free hashmap | the enter / drain / grow protocol |
| Mailbox monitor | the pinned-locus mutex hand-off |
| Bus queue | the cooperative-pool conditional lock |
| Arena subregion lock | the parent’s child-slot freelist |
Each model carries a negative control: delete the synchronization
and GenMC reports the exact bug the real code prevents — proof the
check has teeth. (The per-thread chunk pool needs no model: it is
__thread, with no cross-thread surface.) Sanitizers catch races on
the paths your tests happen to hit; model checking catches the ones no
test reliably triggers — grow-during-drain, compact-then-grow. For a
language whose whole concurrency story is the bus, trusting the
substrate is the foundation everything else rests on.
Your programs are data-race-free by design
Above the substrate, the language is shaped so application code can’t introduce a data race in the first place:
- A typed bus instead of shared state. Loci talk by publishing typed values to topics; the payload is copied into the receiver’s region. There is no shared mutable cell to race on.
- The single-threaded-method invariant. Calling a locus’s method from the wrong pool’s thread is a compile error.
- Vertical-only failure. No lateral references between siblings; a
failure travels up to a parent’s
on_failure, never sideways.
Checked at build time
These run during hale check / hale build, on top of ordinary
type-checking.
Bus-graph properties. The bus topology is a typed graph, and the compiler walks it. This is the analysis that is on by default and fails the build:
- orphan topics (wired to only one end) — warning
- cross-locus cycles that can spin — warning
- intra-locus re-entrant self-publish (unbounded recursion) — error
- backpressure — an unthrottled publish in an unbounded loop — warning
- subject type-mismatch — two sites disagreeing on a payload type — error
Design rules, enforced as errors:
- No locus-return — a method may not hand back a managed locus (a Law-of-Demeter / CQRS / dependency-inversion violation caught in one rule).
- Codec purity — a bus codec’s
encode/decodemust be pure; they may run off-thread. ring_layoutconformance — a foreign shared-memory ring layout is checked for internal and cross-field consistency before a torn read is possible.
Concurrency & placement, keeping a program’s placement coherent with how the runtime dispatches:
- Dead bus receiver — a cooperative locus that subscribes to the
bus and blocks in
run(), so the blocking call monopolizes the pool thread and its handlers never fire — error. - Blocking call on a cooperative pool — a blocking
run()(recv/accept/process::run) on a pool that isn’twhere async_io; it holds the pool’s thread and stalls co-scheduled loci — warning. - Nested long-running child — a non-
mainlocus holding a params field of a locus type whoserun()never returns; the fix is hoisting it to amainsibling with its own placement — error. - Unowned subscriber locus — a bus-subscribing locus instantiated non-owned in another locus’s method body, so it dissolves at scope exit before its subscription can fire — error.
Memory-bound proofs (on by default). Every hale check /
hale build runs the whole-program survey: the compiler’s
escape/loop dataflow flags allocations that escape a per-message
handler or unbounded loop and accumulate until the locus
dissolves — with loop-ranking that proves a while v < N
counter bounded. Run-to-exit programs (a main with no run loop
and no bus handler) warn nothing — a script owes no bound proof.
@unbounded fn is the in-source carve-out for an acknowledged
site; --no-warn-unbounded-alloc opts a run out. Advisory today; a
hard error contract is the intended end state once the remaining
documented false-positive classes get their annotations.
Resource budgets (opt-in). Static counts of file descriptors, OS
threads, cooperative pools, and bus subjects, with a
--check-resource-budget budget.toml ceiling gate for CI and fd-leak
detection.
What Hale does not claim
Hale is not a whole-program functional-correctness prover — that is the world of CakeML and F*. The guarantee here is narrower and deliberately so: the coordination (the bus graph), the substrate (the concurrent primitives), and bounded resource use are verified, because those are the properties that must hold no matter what executes the design — native, wasm, or a future target. Verification that survives a change of substrate is the kind worth building on.
The authoritative, exhaustive catalog of every compile-time check is
spec/verification.md. The verification roadmap that drove this work — now delivered — is GitHub issue #18.
The design
Why one shape held across all four tiers.
This guide descended four levels — a small scripting language, a high-level application language, a concurrent-services language, a systems language. At each level you reached for the same primitive, the locus, and saw more of it. That wasn’t a teaching trick layered on top of the language. It’s the language’s actual structure, and it’s worth seeing whole now that you’ve felt it.
It was towering loci all along
Hale is built bottom-up from one idea: a locus is a system —
a thing that decomposes into sub-systems and serves a role in
some super-system. Everything structural is a locus. A type is
a locus that hasn’t grown flow yet; an app is a locus; a service,
a connection, a collection, a parser — loci, all the way down.
The tiers of this guide are the same tower observed at different depths:
- The basics met a locus as the shell around
main. - Everyday programs saw it as an object with state and methods.
- Concurrent services saw it as a lifecycle, a bus participant, a supervised parent.
- Systems control saw it as a memory region with a layout and an execution strategy.
None of those views contradict; each is a higher-resolution perspective on the thing below. That’s why the function you wrote in chapter one still works in the last chapter — you were descending into one structure, not switching languages.
The commitments that make it hold
A locus carries a small set of structural commitments, and every guarantee in the language falls out of them:
- Bounded attachment. A locus bounds how many things attach to it. (The capacity model you met in the systems tier.)
- Vertical-only flow. A locus talks up to its parent and down to its children — never sideways. Siblings coordinate through a shared parent or the bus.
- Failure flows up. A broken invariant routes to the parent’s policy, recursively, to the root.
- The root is the horizon. Recursion stops at the current observable boundary — the program’s root, a process edge, a substrate.
From vertical-only flow you get memory safety with no GC and no borrow checker: no pointer crosses sideways, so a region frees wholesale at dissolve. From failure-flows-up you get supervised, let-it-crash recovery with typed policy. From bounded attachment you get the cost model the runtime can plan against. The constraints aren’t restrictions bolted on — they’re the source of the guarantees.
Why one shape spans human, LLM, and machine
There’s a structural reason the matchmaker from the introduction decomposes the same way on paper, in Hale, and inside an LLM’s plan. When K things attach to one coordination point, the working state to hold them together costs about K log₂ K bits. That ceiling — roughly 4 to 10 — shows up everywhere coordination happens: human working memory, spans of control, mixture-of-experts active counts, multi-agent LLM saturation. The same bound, substrate-invariant.
A Hale program is the literal shape of that bound: loci are vertices, topics are hyperedges, capacity declarations bound each vertex’s K. So translation across the human → LLM → machine boundary stays cheap — each layer uses the same vertices and edges, and no representation has to be rebuilt in a foreign idiom. It’s the same reason the locus survives the move from the native runtime to the browser to any future substrate: substrate variance doesn’t reach into the shape.
Going deeper
AGENTS.md— the formal model in one page: nodes, hyperedges, and invariants, with thelocus ↔ Σmapping. Written for agents authoring.hl, but it’s the tightest statement of the design for a human too.spec/design-rationale.md— every numbered design decision (F.1…F.36), the alternatives considered, and why each commitment is shaped the way it is.- hale-lang/papers — the structural mathematics and the cross-substrate evidence for the k̄ ∈ [4, 10] bound.
You now have the whole arc: a small language at the top, a systems substrate at the bottom, one shape connecting them. Build something — and if the decomposition into loci feels natural, that fit is the thesis working.