In Module three of Braintrust's Evals course, we stop talking theory and start building.
This hands-on module is about setting up a Braintrust account and building your first eval entirely in the UI. Using the customer support chatbot example from Module two, you'll import a real dataset of customer complaints, define a task, and create a custom LLM-as-a-judge scoring system called "Brand Alignment" that evaluates responses on helpfulness, tone, and policy compliance.
You'll then test two AI personalities (polite vs. concise), save both as experiments, and set yourself up to compare the results in the next module.
Timestamps:
0:00 — Recap: The 3 components of an eval
0:08 — What we're building: Customer support chatbot eval in Braintrust UI
0:33 — Step 1: Create a Braintrust account (free tier)
0:40 — Step 2: Add your OpenAI API key in settings
1:05 — Step 3: Create a new project ("Customer Support Chatbot")
1:19 — Step 4: Import the customer complaints dataset from GitHub
1:45 — Step 5: Open the Playground and connect it to your dataset
1:56 — Step 6: Define the task (user message template variable)
2:03 — Setting up the "Polite Personality" system prompt
2:25 — Running the playground and reviewing AI responses
2:48 — Why you need a score to measure response quality
3:01 — Creating a custom "Brand Alignment" score (LLM-as-a-judge)
3:35 — Scoring criteria: Excellent (100%), Acceptable (50%), Poor (0%)
4:04 — Enabling chain-of-thought reasoning for more consistent scores
4:14 — Running the playground with scoring and reviewing results
4:35 — Testing the "Concise Personality" prompt
5:02 — Recap & what's next: Comparing the two experiments