Evals are the new PRDs: How PMs build quality AI products

Опубликовано: 16 Май 2026
на канале: Braintrust

201

For AI PMs, evals are the new PRDs.

In this talk from Product-Led Summit New York, Ameya Bhatawdekar (VP & Field CTO at Braintrust) makes the case that evals should replace PRDs as the core artifact for AI product managers.

Ameya walks through why conventional product development loops break down when outputs are non-deterministic, how to translate every element of a traditional PRD into its eval equivalent, and the flywheel that separates teams that are winning with AI from those that aren't. He covers the four stages of eval maturity, the pillars of LLM observability, three types of eval judges, and the common mistakes teams make when getting started.

Whether you're a PM shipping your first AI feature or leading an AI-native product team, this talk gives you a practical framework to define "good," measure it in code, and continuously improve.

Timestamps
0:00 – Intro
1:00 – Title: Evals Are the New PRD
3:00 – The document that nobody reads: why PRDs fail for AI
4:00 – The product development loop and why AI breaks it
5:00 – The problem with AI + PRDs
6:00 – Why writing evals is the most important PM skill in the AI era
7:00 – What is an eval, exactly?
9:00 – Evals as the PRD: translating traditional PRD elements into eval equivalents
11:00 – A concrete example: recipe generation feature
13:00 – The hardest part: building the flywheel, not just the eval set
14:00 – The eval flywheel: Observe → Analyze → Evaluate → Improve
18:00 – Flywheel maturity stages (Stage 0 to Stage 3)
20:00 – What fuels the flywheel: observability as product intelligence
23:00 – The 4 pillars of LLM observability
25:00 – The three types of eval judges: Algorithmic, AI Judge, and AI Judge with Human Alignment
27:00 – Common mistakes teams make with evals
32:00 – The PM's new job description
34:00 – The signal hiding in your product: turning user behavior into eval cases
35:00 – Summary: The five rules for AI PMs

Learn more: https://www.braintrust.dev/blog/evals...