Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance

This chapter covers complexity, memory, and practical scaling guidance. The core is designed to handle tens of thousands of riders per tick on a single thread, with per-tick cost dominated by the dispatch strategy you choose.

Complexity overview

Let E = elevators, R = riders, S = stops. Per sim.step():

PhaseCostNotes
Advance transientO(R) worst-case, O(transitioning riders) typicalOnly touches riders in Boarding/Exiting.
Dispatch (SCAN/LOOK)O(E · S)Constant work per elevator per stop in the group.
Dispatch (NearestCar)O(E · S)Uses decide_all to coordinate.
Dispatch (ETD)O(E · S · R_waiting)Estimates per-rider delays; heaviest built-in.
RepositionO(E · S)Only runs if configured.
MovementO(E)Pure arithmetic per elevator.
DoorsO(E)Door FSM per elevator.
LoadingO(boarding + exiting at each open door)Uses the rider index for per-stop queues.
MetricsO(events this tick)Linear in the event count.

Population queries (residents_at, waiting_at, abandoned_at) are O(1) via the rider index, so UIs and hooks can poll them every tick without penalty.

Memory

Rough per-entity memory (native x86_64, with default components):

EntityBytes (approx.)
Stop~120
Elevator~200
Rider (with Route and Patience)~160

Add your own extension components on top. A 10k-rider simulation with a dozen stops and a handful of elevators fits comfortably under 5 MB of live state.

The event buffer grows until drain_events() is called — see Metrics and Events → Buffer size and memory.

Benchmarks

The crate ships a benchmark suite (Criterion) in crates/elevator-core/benches/:

BenchMeasures
sim_benchEnd-to-end step() across representative scenarios
scaling_benchThroughput vs. rider count
dispatch_benchPer-strategy cost at a fixed rider load
multi_line_benchMulti-group, multi-line buildings
query_benchPopulation and lookup queries

Run with:

cargo bench -p elevator-core

Results go to target/criterion/ with HTML reports. A nightly GitHub Actions job (.github/workflows/bench-nightly.yml) reruns the full suite daily, caches a baseline, and opens an issue when Criterion flags a significant regression. There is no PR gate — bench noise on shared runners tends to swamp a strict per-PR check.

Current baselines

Measured on a 32-core Linux x86_64 workstation (Rust stable, release profile, Criterion defaults: 3 s warmup, 5 s measurement). Numbers are the Criterion median unless noted. Shared-runner numbers will be noisier; treat these as orders of magnitude, not tight SLAs.

Primitives

ItemTime
tick_movement (single call)~1.3 ns
sim_bench / dispatch / 3e_10s~4.0 µs
sim_bench / dispatch / 10e_50s~12 µs

Full tick throughput (scaling_bench)

ScenarioTime per runPer tick
50 elevators, 200 stops, 2 000 riders, 100 ticks~14 ms~143 µs/tick
500 elevators, 5 000 stops, 50 000 riders, 10 ticks~520 ms~52 ms/tick
10 000-rider spawn pressure test~4.9 ms

The realistic row is the one most consumers should care about: a medium office tower with 2 000 concurrent riders runs the full 8-phase tick in well under a millisecond on a single core.

Dispatch strategy comparison (dispatch_bench)

Per step() cost at three scales, holding everything else constant:

ScaleSCANLOOKNearestCarETD
5e, 10s61 µs67 µs63 µs66 µs
20e, 50s436 µs395 µs423 µs413 µs
50e, 200s2.18 ms2.00 ms2.04 ms1.96 ms

The four built-in strategies land within ~15 % of each other at every scale. ETD is competitive despite its richer cost model because the other phases dominate wall-clock time. Pick the strategy that fits your dispatch behavior needs; they’re all fast enough.

Query surface (query_bench)

O(n) over entity population, as the API docs promise:

Query1001 00010 000
query<Rider>13 µs60 µs744 µs
query_tuple<&Rider, &Patience>12 µs52 µs859 µs
query_elevators (10/50/200)4 µs5 µs13 µs

Population queries on RiderIndex (residents_at / waiting_at / abandoned_at) are O(1) and don’t appear here — they run in tens of nanoseconds.

Multi-group topology (multi_line_bench)

ScenarioTime
multi_3g_2l_5e_20s / step()~920 µs
cross_group_routing / 10 groups~330 µs
topology_queries / reachable_stops_from~177 µs
topology_queries / shortest_route~161 µs
dynamic_topology / add_line~2.5 µs
dynamic_topology / topology_rebuild~21 µs

Runtime topology mutations (add_line, remove_line, add_stop_to_line) are single-digit microseconds because the graph is rebuilt lazily on next query, not eagerly on every mutation.

Use these as a baseline when writing custom dispatch strategies — if your strategy’s dispatch_bench time is 10× the ETD baseline, expect a 10× slowdown in loaded simulations.

Scaling checklist

For simulations above ~10k concurrent riders or above ~50 elevators:

  1. Pick the cheapest dispatch strategy that meets your needs. NearestCarDispatch is usually a better default than ETD at scale.
  2. Split into groups. Each group dispatches independently; two groups of 20 elevators each is cheaper than one group of 40.
  3. Drain events every tick (or redirect into a bounded ring buffer) to keep memory flat.
  4. Avoid heavy work in hooks. A hook that iterates all riders every tick is O(R) on top of the dispatch cost — prefer extension-attached flags you can toggle on-event.
  5. Profile before optimizing. The Criterion benches make it straightforward to identify the hot phase — dispatch dominates far more often than movement or doors.

What we do not provide

  • Parallelism. The tick loop is single-threaded by design (determinism > throughput). Run multiple sims in parallel across threads if you need more aggregate work.
  • GPU acceleration. Movement and dispatch are scalar — no SIMD or GPU backends.
  • Persistent indexes beyond per-stop population. If you need “all riders with extension X”, iterate and filter.

Next steps

Head to Bevy Integration for a visual wrapper, or API Reference for the full API surface.