Evals Course: Building a multi turn chat app

Опубликовано: 22 Май 2026
на канале: Braintrust
123
1

In Braintrust's Evals course, we've been building with an example based on a customer support chat application. But real customer support interactions aren't single-turn conversations, they're back-and-forth, multi-turn conversations.

In Module ten, our customer support example gets upgraded into a true multi-turn CLI chat app, fully instrumented with Braintrust logging so every conversation is captured as a single connected trace.

You'll see how token counts grow with each turn as conversation history accumulates, learn why the root span wrapper is critical for grouping turns together, and understand what your logs look like with, and without, that grouping.

With this understanding, we'll move on to scoring full conversations in the next module.

Timestamps:

0:00 — Why single-turn isn't enough for real customer support
0:20 — What we're building: a multi-turn chat app with production logging
0:38 — Code overview: simple CLI chat interface
0:49 — init_logger vs. eval: switching from experiments to production logging
1:03 — wrap_openai: auto-capturing every API call as an LLM span
1:13 — @traced decorator: creating function spans on each chat call
1:21 — logger.start_span: wrapping the full session as one root span
1:33 — The conversation loop: appending history, opening child spans, logging turns
1:50 — Logging conversation-level metadata on close (full history + turn count)
2:01 — Running the chat app and having a live conversation
2:20 — Viewing the single log entry in Braintrust
2:36 — Trace structure: root span with nested function and LLM spans
2:47 — How conversation history grows across turns (token count demo)
3:15 — Token growth: prompt tokens ~5x over a 4-turn conversation
3:23 — What happens without the root span wrapper
3:40 — Result: 4 separate disconnected log entries instead of one trace
4:08 — Why the root span is essential for connected conversation tracing
4:13 — Recap & what's next: Scoring multi-turn conversations per turn and as a full trace