EmberKeep is a Unity 6 tech demo where every NPC runs a quantized
3B-parameter LLM locally on the player's machine, holds persistent
memory across sessions, and stays within a strict per-frame inference
budget so the game maintains a steady 60 FPS.
I built this to demonstrate the production-hard parts of shipping
GenAI in real games — quantized on-device inference, frame-budget
enforcement, behavior-tree + LLM hybrid NPCs, persistent character
memory, and an editor tool that turns the LLM into an "AI superpower"
for designers.
What you're seeing in this demo:
Bram the Innkeeper — pure-LLM dialogue with persistent
cross-session memory via prompt-injected summaries.
Mira the Merchant — behavior-tree-driven intent (haggle / refuse /
accept) with LLM-generated dialogue lines. The production-correct
pattern for shippable NPC AI.
Old Finn the Storyteller — on-demand procedural short stories with
streaming token rendering and perceived-latency masking.
"Generate NPC" Editor Tool — type a one-line concept, get a full
NPC ScriptableObject with backstory, voice profile, and sample lines.
Tech stack:
Unity 6 (Built-in Render Pipeline) + C#
llama.cpp compiled as a native plugin (C++)
Llama-3.2-3B-Instruct, Q4_K_M quantization (~2 GB on disk)
Worker-thread inference + lock-free SPSC token queue
Per-NPC KV-cache, shared model weights
100% on-device — no cloud calls, no PII leaves the machine
Why this design: the main thread never blocks on inference. Tokens
are produced on a worker thread and dequeued at most once per frame,
capped at a per-frame budget. Generation feels real-time to the
player, but the render loop never starves.
Built by Samuel Shamber as a tech demonstrator for shipping GenAI
features inside production game engines.
#Unity #Unity6 #GenAI #LLM #GameAI #LlamaCpp #IndieDev #UnityDev
#MachineLearning #GameDevelopment #AI #NPC #GameDev