Designing Data Intensive Applications - Part 3 - p99 Latency, Percentiles & Scaling

Опубликовано: 25 Июнь 2026
на канале: AssistedReading AI

Designing Data-Intensive Applications (DDIA) Part 3 — the percentile latency masterclass every backend engineer needs: p50, p95, p99, p999, tail latency, and why averages lie. Walking through Martin Kleppmann's chapter on describing system performance, with the Amazon "99.9th percentile" case study every SRE has heard quoted.

This part is the SLO/SLA chapter most teams should read before their next on-call rotation. We cover why median (p50) is the only "average" worth reporting, why tail latencies (p95/p99/p999) actually drive user experience, what "tail latency amplification" means for fan-out architectures, and the practical choice between scaling up and scaling out when load grows.

⏱ Chapters
00:00 — Describing Performance: average vs median, why the mean misleads you about real user experience, introduction to p50 / p95 / p99 / p999, and the Amazon 99.9th-percentile example
06:53 — Percentiles in Practice: SLOs and SLAs, latency histograms, fan-out and tail latency amplification, sliding-window vs rolling-window percentile measurement
08:37 — Approaches for Coping with Load: vertical scaling vs horizontal scaling, stateless vs stateful service trade-offs, elastic systems, and why there's no one-size-fits-all scaling architecture

🎯 What you'll learn
• Why the average response time is a lie — and what to measure instead
• How to read latency histograms and avoid common percentile-aggregation mistakes
• Why optimizing p999 costs Amazon more than it's worth (and where that breakeven sits for your system)
• Tail latency amplification: how a single slow backend call dominates a fan-out request
• When to scale up vs scale out — and the architectural cost of each
• The difference between stateless and stateful scaling

💬 What's the worst tail-latency surprise you've debugged? p99 spiking from a GC pause, a noisy neighbour, a cold cache? Drop it in the comments.

🔔 Subscribe for the full DDIA series — Part 4 dives into maintainability, operability, and evolution.

📖 Source: Designing Data-Intensive Applications by Martin Kleppmann (O'Reilly, 2017)

#DDIA #SystemDesign #DistributedSystems #BackendEngineering #SoftwareArchitecture #SystemDesignInterview #SiteReliabilityEngineering #SRE #LatencyPercentiles #TailLatency #PerformanceEngineering #Scalability #BookSummary