EmberKeep: Running a Local LLM Inside Unity 6 at 60 FPS

Опубликовано: 20 Июнь 2026
на канале: Sam Shamber
27
1

EmberKeep is a Unity 6 tech demo where every NPC runs a quantized
3B-parameter LLM locally on the player's machine, holds persistent
memory across sessions, and stays within a strict per-frame inference
budget so the game maintains a steady 60 FPS.

I built this to demonstrate the production-hard parts of shipping
GenAI in real games — quantized on-device inference, frame-budget
enforcement, behavior-tree + LLM hybrid NPCs, persistent character
memory, and an editor tool that turns the LLM into an "AI superpower"
for designers.

What you're seeing in this demo:
Bram the Innkeeper — pure-LLM dialogue with persistent
cross-session memory via prompt-injected summaries.
Mira the Merchant — behavior-tree-driven intent (haggle / refuse /
accept) with LLM-generated dialogue lines. The production-correct
pattern for shippable NPC AI.
Old Finn the Storyteller — on-demand procedural short stories with
streaming token rendering and perceived-latency masking.
"Generate NPC" Editor Tool — type a one-line concept, get a full
NPC ScriptableObject with backstory, voice profile, and sample lines.

Tech stack:
Unity 6 (Built-in Render Pipeline) + C#
llama.cpp compiled as a native plugin (C++)
Llama-3.2-3B-Instruct, Q4_K_M quantization (~2 GB on disk)
Worker-thread inference + lock-free SPSC token queue
Per-NPC KV-cache, shared model weights
100% on-device — no cloud calls, no PII leaves the machine

Why this design: the main thread never blocks on inference. Tokens
are produced on a worker thread and dequeued at most once per frame,
capped at a per-frame budget. Generation feels real-time to the
player, but the render loop never starves.

Built by Samuel Shamber as a tech demonstrator for shipping GenAI
features inside production game engines.

#Unity #Unity6 #GenAI #LLM #GameAI #LlamaCpp #IndieDev #UnityDev
#MachineLearning #GameDevelopment #AI #NPC #GameDev