Evals Course: How to read a trace

Опубликовано: 21 Май 2026
на канале: Braintrust

128

In earlier modules of Braintrust's Evals course, we set up experiments and let them run. Now the question is what actually happened under the hood?

Module eight introduces traces, which are the complete record of a single eval row from input to scored output. You'll learn the key terminology (spans, root spans, LLM spans, scoring spans, and more), then walk through a real trace from the concise personality experiment step by step. By the end, you'll be able to read any trace in Braintrust and understand exactly what ran, in what order, how long it took, and why a response scored the way it did.

Timestamps:

0:00 — What is a trace and why it matters
0:13 — Every experiment row = one trace
0:20 — Example: each trace = one customer message through the full pipeline
0:28 — Span terminology: the building blocks of a trace
0:37 — Root span: top-level input, output, and final scores
0:44 — LLM span: model name, input messages, output, token counts
0:52 — Scoring span: score name, value, and chain-of-thought reasoning
1:03 — Function span: wraps a block of Python code
1:10 — Task span: a unit of work that produces a meaningful result
1:23 — Tool span: external API calls made by the LLM
1:39 — Each span records its own start/end timestamps
1:45 — Live trace walkthrough: concise personality, "missing package" complaint
1:55 — Eval span: bird's eye view of input, output, and score
2:20 — Task span: wraps the concise task function
2:38 — LLM span: the actual GPT model call and token metrics
2:56 — Brand alignment scoring span and its nested LLM call
3:11 — Chain-of-thought reasoning inside the score span
3:38 — Recap & what's next: Analyzing experiment results to answer real questions