Evals Course: What is an eval?

Опубликовано: 21 Май 2026
на канале: Braintrust

366

Module one of Braintrust's Evals course explained why evals matter. This module breaks down what an eval actually is.

At its core, an eval has three components: a dataset, a task, and a score. This module covers three practical examples to show how each component works. The examples also illustrate how the data determines which scoring method is right for your use case. Our examples — a customer support chatbot, a factual Q&A system, and an AI music generator — all produce very different kinds of data.

Depending on what your needs are and the data you're working with, you can build your eval around deterministic scoring, LLM-as-a-judge, or human review.

With a clear mental model for how evals are structured, we'll move on to building one from scratch in the next module.

Timestamps:

0:00 — Intro: What is an eval?
0:05 — The practical questions evals help you answer
0:40 — The 3 core components of an eval: dataset, task, and score
0:57 — Component 1: Dataset explained
1:12 — Example dataset 1: Customer complaints (support chatbot)
1:28 — Example dataset 2: Factual Q&A (with expected outputs)
1:49 — Example dataset 3: AI music generation (with metadata)
2:13 — How each dataset is shaped differently
2:24 — Component 2: The Task — defining what the AI should do
3:09 — Component 3: Scoring — measuring good vs. bad output
3:26 — Scoring method 1: Deterministic scoring (exact match, contains match, normalized match)
4:34 — Scoring method 2: LLM-as-a-judge (for subjective tasks)
5:28 — Scoring method 3: Human-in-review (for creative/high-stakes tasks)
6:19 — Recap & what's next: Building your first eval from scratch