Evaluating and improving Replit Agent at scale

Опубликовано: 15 Май 2026
на канале: Claude

2,007

36

Most teams shipping AI products can't build evals that predict how a model will actually perform in production. Michele Catasta, President & Head of AI at Replit, shares how his team closed that gap with ViBench — a public vibe-coding benchmark that scores whether the generated app works — and the offline/online evaluation loop behind Replit Agent that turns weeks of engineering into compounding overnight gains. Anthropic's Hannah Moran joins to share what separates evals that look rigorous from ones that actually help teams adopt new models with confidence.

Driven Astray - Won't Let You In (Official Audio)

Driven Astray - Won't Let You In (Official Audio)

Windows 7 update KB4523206 fails with the error 80242016

Windows 7 update KB4523206 fails with the error 80242016

Gurenge - Demon Slayer OP 1 | (90s City Pop Style)

Gurenge - Demon Slayer OP 1 | (90s City Pop Style)

Favori metro hattım Sago 5

Favori metro hattım Sago 5

CDJ-3000 HID Mode with Native Instruments Traktor Pro 3

CDJ-3000 HID Mode with Native Instruments Traktor Pro 3

Among Us: Solo Impostor Outsmarts 14 Players 😱 – Epic Takeover You Must See! 🔥

Among Us: Solo Impostor Outsmarts 14 Players 😱 – Epic Takeover You Must See! 🔥

Deploying a Standalone Root CA in Windows Server 2012 R2

Deploying a Standalone Root CA in Windows Server 2012 R2

আন্তর্জাতিক সব খবর | Banglavision World News | 26 February 2025 | International News Bulletin

আন্তর্জাতিক সব খবর | Banglavision World News | 26 February 2025 | International News Bulletin