Facebook's 6-Hour Outage Explained - When 3.5 Billion Users Vanished from the Internet

Опубликовано: 22 Июнь 2026
на канале: GyanByte
505
7

On October 4th, 2021 — Facebook, Instagram, WhatsApp, and Messenger all vanished from the internet. 3.5 billion users couldn't connect. Facebook's stock dropped $47 billion in market cap. And it all started with one maintenance command.

A routine backbone configuration change severed every connection between Facebook's data centers. An audit tool that should have caught the mistake had a bug — it let the command through. Within 3 minutes, every BGP route Facebook announced was withdrawn. DNS servers saw the backbone was gone and stopped advertising themselves. To every router on the internet, Facebook simply didn't exist.

Then it got worse. The engineers who knew how to fix it couldn't reach the systems. Remote tools ran on Facebook's own network — all down. Badge readers at data centers authenticated against internal servers — doors wouldn't open. They were locked out of their own infrastructure.

In this video, we explain BGP from scratch, walk through Facebook's infrastructure, show exactly what triggered the outage, trace the cascade step by step, explain why recovery took nearly 7 hours, and cover the lessons every engineer should know.

Timestamps:
0:00 Introduction — The Day Facebook Vanished
0:55 What Is BGP — The Internet's Routing Protocol
2:03 Facebook's Infrastructure — Backbone, PoPs, DNS
3:09 The Trigger — One Command, One Bug
4:08 The Cascade — DNS, Routes, Total Failure
5:19 Locked Out — Engineers Can't Fix It
6:34 The Recovery — Why It Took 7 Hours
7:49 Lessons — And Other BGP Disasters

What you'll learn:
BGP basics: how 70,000+ autonomous systems route internet traffic
Facebook's backbone architecture: data centers, PoPs, peering
The exact maintenance command that severed the backbone
Why the audit tool's bug let a catastrophic change through
The cascade: backbone down → DNS withdraws → routes vanish → SERVFAIL
Why remote access, badge systems, and internal comms all failed
Physical security systems that slowed the recovery
Power constraints that prevented a quick restart
Full UTC timeline: 15:39 trigger → 22:45 restoration
Other BGP disasters: Pakistan/YouTube 2008, Google/Japan 2017

Previous video: SSRF — Your Server Is the Attacker's Proxy
Next video: Security Misconfiguration

#Networking #BGP #Facebook #CyberSecurity #Infrastructure