Performance
ecsia ships a reproducible benchmark harness and publishes its actual output — including the cases where a single-purpose library beats us. The tables below are generated by pnpm bench:report, which writes both bench/RESULTS.json (the machine-readable artifact, with a full environment block) and the markdown this page includes. Re-running the command regenerates both, so the page can never disagree with the artifact.
Numbers will vary
These are wall-clock measurements (real elapsed time) on one machine at one moment. Your CPU, Node version, thermal state, and background load will move them — sometimes a lot. Treat the shapes (which loop is faster, where the worker curve crosses 1×) as the durable story; treat the absolute milliseconds as a snapshot. The environment block above each table records exactly what produced it.
Methodology
What each loop does. The single-thread iteration bench is the classic ECS hot loop — the small stretch of code that runs over every entity, every frame, and therefore dominates the cost. Here it adds each entity's Velocity to its Position, one frame per timed op. The inner loop is allocation-free — it creates no objects while running — so the measurement is storage and iteration cost, not garbage collection.
- ecsia
.each— the ergonomic accessor path: per-row proxy objects, write-log aware. - ecsia
eachChunk— the opt-in column cursor. ecsia stores each component field in its own contiguous array (a column), a layout called Structure-of-Arrays (SoA).eachChunkhands you those raw typed-array columns plus a row span, and the loop indexesFloat32Arraydirectly — bypassing the per-row accessor and the reactivity write-log push. - ecsia
bindColumns— the bind-once fast path: the same raw columns aseachChunk, but resolved once up front instead of on every call, so the engine can compile the loop with them baked in. See Bind your loop once below. - miniplex — array-of-objects iteration.
- bitECS — a raw SoA loop over a flat query result. The reference single-thread baseline.
Before timing, the harness cross-validates one full step of eachChunk and of bindColumns against .each at the bench's own N (which crosses the 1024-row column-growth boundary). A fast-but-wrong loop fails the run instead of silently reporting a misleading number.
Why per-entity work matters for the speedup bench. The worker-pool bench is not a toy doing trivial arithmetic per entity. Each entity runs an iterated damped-oscillator integrator (a small spring-physics simulation): hundreds of sub-steps of expensive math calls (sin/cos/exp) per frame. That is deliberate. Parallel speedup only appears once each wave — a batch of systems that can safely run at the same time because none writes data another touches — amortizes the coordination cost, meaning the per-frame work is large enough that the dispatch overhead (the fixed cost of handing work to worker threads) and the synchronization between waves stop mattering. A trivial body would be pure overhead and would never beat single-thread — which is exactly why benchmarks that show "linear scaling" on trivial work are misleading. We make the work heavy enough to be honest about where the crossover actually is.
Single-process discipline. Every bucket runs sequentially in one process. The iteration bench uses tinybench with a fixed time budget; the worker-pool bench is the only thing that spawns OS threads, and it does so one configuration at a time. No two measurements compete for cores.
Environment disclosure. bench/RESULTS.json and the table header carry the Node version, CPU model and logical core count, the date, and the commit SHA. A number without its machine is not a number.
Results
Environment. AMD Ryzen 9 7950X3D 16-Core Processor (32 logical cores) · Node v24.11.0 · 2026-06-05 · commit 90c943b
Single-thread iteration
Each loop adds every entity's velocity to its position, over 50,000 entities per op. ns per entity is mean op time divided by entity count (nanoseconds per entity — lower is faster); ratio vs bitECS is bitECS ops/s ÷ this row's ops/s. The ecsia bindColumns row binds its loop to the storage once, up front; if storage grows after that binding the loop runs slower from then on (roughly 1.7 ns per entity instead of ~1.0), so pre-size the world to peak capacity — spawn or reserve before binding.
| loop | ops/s | ms/op | ns per entity | ratio vs bitECS |
|---|---|---|---|---|
| ecsia .each | 1,980 | 0.5059 | 10.12 | 9.49x |
| ecsia eachChunk | 13,661 | 0.0736 | 1.47 | 1.38x |
| ecsia bindColumns | 20,457 | 0.0492 | 0.98 | 0.92x |
| miniplex | 1,457 | 0.6967 | 13.93 | 12.90x |
| bitECS | 18,792 | 0.0535 | 1.07 | 1.00x (baseline) |
Tracked-write cost — the same .each loop with a .changed() filter attached and drained each frame (the change-tracking overhead you opt into for reacting to changes):
| loop | ops/s | ms/op | ns per entity | ratio vs bitECS |
|---|---|---|---|---|
| ecsia .each + .changed() | 152 | 6.6543 | 133.09 | 123.32x |
Worker-pool speedup
Real node:worker_threads + Atomics. 8 independent Body groups × 1,024 entities (8,192 total), 512 sub-steps of expensive math (sin/cos/exp) per entity per frame, 60 frames. Speedup is single-thread wall-clock time ÷ this row's. byte-identical confirms the threaded run's sum-of-fields checksum equals the single-thread run's.
Single-thread baseline: 12057.5 ms.
| workers | wall ms | speedup vs 1 thread | byte-identical |
|---|---|---|---|
| 1 | 12364.3 | 0.98x | yes |
| 2 | 6375.7 | 1.89x | yes |
| 4 | 3396.2 | 3.55x | yes |
| 8 | 1991.6 | 6.05x | yes |
Honest analysis
bitECS wins the default-path comparison. Its flat SoA loop out-iterates both
.eachandeachChunk, and we do not pretend otherwise. If your entire workload is one tight integrate loop on a single thread and you never reach forbindColumns, bitECS is the fastest tool here.ecsia
bindColumnsedges ahead of bitECS on this bench. Once the loop is bound, it is the same raw-typed-array shape bitECS uses with one less indirection — ecsia walks its rows densely where bitECS indexes through an entity list. The edge comes with homework: keep the loop closure persistent and pre-size before binding. See Bind your loop once.ecsia
.eachbeats miniplex. The ergonomic accessor path — proxies, write-log awareness, and all — still out-iterates miniplex's array-of-objects walk. You do not pay for ecsia's ergonomics by dropping below the closest ergonomic competitor.ecsia
eachChunklands within ~1.4× of bitECS on a modern V8. The column cursor re-resolves its columns every call — that re-resolution is what keeps it safe under storage growth with zero setup, and whatbindColumnstrades away for the last ~30%.The tracked-write row is the cost you opt into. Attaching a
.changed()filter and draining it each frame is markedly more expensive than the bare integrate loop — that is the write-log doing real work so reactivity, deltas, and change observers are available. You pay it only when you ask for it; the plain.eachandeachChunkrows are what you get when you don't.The worker curve is the capability nobody else ships. No mainstream JS ECS ships a real
worker_threads+ Atomics auto-parallel scheduler. The speedup column crosses 1× between 1 and 2 workers: at one worker you pay dispatch and wave-sync overhead for nothing (it is slower than the single-thread executor — dispatch overhead is real and we show it), and the parallel win only materializes once a second worker shares the load. Thebyte-identicalcolumn confirms every threaded run produces byte-for-byte the same result as the single-thread run; the speedup is genuine parallelism, not a relaxed-correctness shortcut.This holds at any column size. In a threaded world, each column lives in a
SharedArrayBuffer— memory several threads can read and write at once — with address space reserved up front forINITIAL_ROWS 64 × GROWTH_RESERVE_FACTOR 16 = 1024rows. When a column grows past that reservation, it re-backs: it moves to a new, largerSharedArrayBuffer. The pool then re-wraps every worker's view of that column at the wave fence (the synchronization point between waves) before the next dispatch; when nothing grew, this costs one generation check per wave. The earlier 0.1.0 pre-release per-column growth cap is retired — growing past 1024 rows is covered directly bypackages/scheduler/test/worker-growth-boundary.test.ts(1024 in-place grow + 1025/1040 re-backing) and the above-reservation case in the heavy-pool smoke test.
Bind your loop once: bindColumns
eachChunk looks its columns up again on every call. That re-lookup is the safe default — a column's array can be replaced when storage grows — but it stops V8 from compiling your loop with the arrays baked in as constants, and that compilation is worth about 30% on the iteration bench. bindColumns gets it back without giving up the safety: you hand the query the columns you want and a factory function; ecsia resolves the columns once, calls your factory with them, and keeps the loop your factory returns. Each run() then runs your loop directly — ecsia re-binds it only when storage actually moved.
import { createWorld, defineComponent, write } from 'ecsia'
const Position = defineComponent({ x: 'f32', y: 'f32' }, { name: 'position' })
const Velocity = defineComponent({ dx: 'f32', dy: 'f32' }, { name: 'velocity' })
const world = createWorld({ components: [Position, Velocity], maxEntities: 1 << 16 })
const q = world.query(write(Position), write(Velocity))
const dt = 1 / 60
const run = q.bindColumns(
[Position, 'x'], [Position, 'y'], [Velocity, 'dx'], [Velocity, 'dy'],
([px, py, dx, dy], meta) => () => {
const count = meta.count // the live entity count — read it inside the loop
for (let i = 0; i < count; i++) {
px[i] = px[i]! + dx[i]! * dt
py[i] = py[i]! + dy[i]! * dt
}
},
)
run() // call once per frameTwo requirements make this fast, and both are part of the contract rather than style:
- Your loop must persist. The speed comes from V8 treating the captured arrays as constants, and it only does that for a closure created once and then reused. The API shape makes that the natural thing: ecsia calls your factory, keeps the returned loop, and re-invokes the factory only when a bound column's storage was replaced or a new group of entities starts matching the query. Entities spawning and despawning never re-invoke it — the loop reads the live count from
meta.count. - The returned loop takes no arguments. Passing the count in as a parameter measured about 2× slower; reading
meta.countinside the loop is free.
The trade-offs are the same as eachChunk: writes through the bound arrays bypass the write log, so .changed() filters and observers will not see them, and structural changes during run() follow the same collect-first, mutate-after rule as every other loop.
Pre-size before you bind
The first time a bound column grows after you have bound, V8 permanently stops specializing that loop — it keeps working, just slower (about 1.7 ns per entity in our profiling, instead of the steady-state number in the table). Growth before the bind costs nothing. So spawn, or reserve, up to your peak entity count first — the world's maxEntities is a natural guide — and bind once the world is at size.
Reproduce
pnpm build # the harness imports the BUILT package dist (tsx-free)
pnpm bench:report # regenerates bench/RESULTS.json + website/guide/_perf-tables.mdbench:report runs the bounded config the published tables use: iterate at N=50,000 (3 reps, 300 ms budget per task) and the worker pool at 1,024 entities/group across [1, 2, 4, 8] workers (clamped to your logical core count), 60 frames, fixed seed.
Heavier, longer-running variants drive the same builders at larger sizes for deeper console detail:
pnpm bench:macro # cross-library macro-benches (iterate + relations + parallel), full sizes
pnpm bench:macro:pool # the worker-pool speedup sweep on its own, with the printed tableRegression guard
CI does not run bench:report. Wall-clock time on shared CI runners is noise — a slow neighbor on the runner would flap the suite — so we deliberately do not assert milliseconds in CI. What CI does guard is correctness and behavior, with counter-based assertions: the worker-pool smoke test (a quick check that it compiles and runs, not a measurement) asserts every threaded configuration is byte-identical to single-thread (the byte-identical column above), and the bench builders are cross-checked so neither eachChunk nor bindColumns can silently diverge from .each. Performance regressions are caught by re-running bench:report on a fixed machine and comparing RESULTS.json, not by a CI timer.
See also
- Parallelism — how the auto-parallel scheduler decides what can run concurrently.
- The status banner — the broader "0.x, unpublished, experimental" caveat this page lives under.