Architecture overview

stackchan-kai models the desk toy as data. An Entity holds the state (face, motor, perception, voice, mind, events, input, tick); a Director sorts Modifiers by phase + priority and ticks them against the entity each frame. The engine is no_std, allocation-free, and shared verbatim between firmware and the host simulator.

Boot sequence

esp-hal init
    │
    ▼
internal SRAM + PSRAM heaps (esp-alloc)
    │
    ▼
esp-rtos embassy executor
    │
    ▼
AXP2101 LDOs (LCD rails, power-key timing, BATFET, ADC)
    │
    ▼
AW9523 I/O expander (LCD reset pulse, backlight-boost gate)
    │
    ▼
SPI2 + mipidsi ILI9342C (320×240 RGB565)
    │
    ▼
SCServo on UART1 (head pan/tilt)
    │
    ▼
PY32 co-processor (servo power, WS2812 ring)
    │
    ▼
Spawn embassy tasks → main heartbeat loop

Time to “boot complete — idle heartbeat” is ~1.4 s on the CoreS3.

Task graph

Every task is either a producer (sensor → Signal) or a sink (Signal → hardware). Each frame the render task drains sensor signals into entity.perception / entity.input, calls director.run(&mut entity, now), then dispatches the post-frame sinks (LCD blit, head pose, LED frame, chirp request).

                  ┌─────────────┐
   touch ────────▶│             │
   IR    ────────▶│             │
   IMU   ────────▶│             │
   ambient ──────▶│  render     │──▶ LCD (mipidsi blit)
   power ────────▶│  (30 FPS)   │──▶ pose Signal ──▶ head_task ──▶ SCServo
   audio RMS ────▶│             │──▶ LED frame Signal ──▶ led_task ──▶ PY32
   camera ───────▶│             │──▶ chirp queue ──▶ audio TX
                  └─────────────┘
                         │
                         └──▶ heartbeat → watchdog (5 s poll)

Producers publish via Signal::signal(value); the render task drains via try_take() once per frame. Latest-wins: producers signal at any rate, the render task picks up the most recent value, the next signal overwrites anything unread.

The engine

pub struct ModifierMeta {
    pub name: &'static str,
    pub description: &'static str,
    pub phase: Phase,
    pub priority: i8,
    pub reads: &'static [Field],
    pub writes: &'static [Field],
}

Modifier mutates the entity once per frame via update(&mut Entity) and exposes static ModifierMeta. Director owns a fixed-capacity heapless registry of modifier and skill references, sorts them once on first run(), and ticks them each frame.

Skill is a longer-running predicate-fired capability — a recognizer that watches percepts and writes mind.intent / mind.attention. Later-phase modifiers translate that intent into face / motor. The current population lives at crates/stackchan-core/src/skills/.

reads / writes are documentation, and cfg(debug_assertions) builds assert that each modifier only writes its declared Fields after every update — see Director::run.

Phase ordering

Modifiers run in phase order, then by priority within a phase, then by registration order:

Phase	Numeric	Role
`Perception`	10	empty (render task drains Signals before `run()`)
`Cognition`	20	empty
`Affect`	30	emotion deciders (Touch/Remote/Pickup/Voice/…)
`Speech`	40	empty
`Expression`	50	visual modifiers (`StyleFromEmotion`, Blink, Breath, …)
`Motion`	60	head modifiers (`IdleHeadDrift`, `HeadFromEmotion`, …)
`Audio`	70	audio-driven visual (`MouthFromAudio`)
`Output`	80	empty (render task draws after `run()`)

Numeric gaps of 10 leave room to insert a phase between existing variants without renumbering. The current population list lives in crates/stackchan-core/src/modifiers/mod.rs and the Phase enum docstring — those track the source of truth.

Skills run after the modifier pass each frame; the Director polls each registered skill’s should_fire predicate and invokes matching ones in priority order. Skills write mind.intent / mind.attention / voice / events; modifiers in later phases translate that into face / motor.

Entity components

pub struct Entity {
    pub face: Face,         // visual surface
    pub motor: Motor,       // head pose
    pub perception: Perception,
    pub voice: Voice,
    pub mind: Mind,
    pub events: Events,     // one-frame flags, cleared by Director
    pub input: Input,       // firmware → modifier pending inputs
    pub tick: Tick,         // { now, dt_ms, frame }, stamped each run()
}

Sub-component ownership: perception and input are firmware-write / modifier-read; face, motor, and voice.chirp_request are modifier-write / firmware-read.

Skill conventions

Skills don’t write entity.face or entity.motor directly. They express intent through mind, voice, and events; modifiers in Phase::Expression and Phase::Motion translate that intent into rendered face and physical motion. The rule is documented; a SkillView<'a> borrow type that enforces it via the type system is sketched but not implemented.

Host simulator

crates/stackchan-sim constructs an Entity + FakeClock and runs the same Director against hand-crafted time sequences. Pixel-golden tests assert on Eye::weight, Mouth::mouth_open, etc. The viz binary opens an egui + winit window and runs the modifier stack at 30 FPS so behavior changes iterate in sub-second cycles instead of the ~30 s build → flash → boot loop (cargo run -p stackchan-sim --bin viz --features viz).