Why So Many Database Engines? The RUM Conjecture Explained (New Audio)

Опубликовано: 17 Июнь 2026
на канале: Sorted Runs
81
4

The RUM Conjecture proves that optimizing reads, writes, and memory is a three-way tradeoff - push one down, another goes up. Every engine on the market is making a different bet.
Dozens of storage engines are built on the same LSM tree. So why hasn't someone just built the best one?

In this episode we break down the two fundamental compaction strategies and show why each engine exists:
Leveled compaction: fast reads, but writes cascade through every level (10-30x amplification)
Tiered compaction: fast writes, but reads scan every run at every level (12 checks for one key)
Bloom filters: the elegant cheat that eliminates most disk reads for 10 bits per key
The system tour: LevelDB, RocksDB, Pebble, Cassandra, ScyllaDB, TigerBeetle
Why even RocksDB developers "don't fully understand the effect of each configuration change"

No handwaving. No skipping steps. Just the actual mechanics, animated step by step.

RUM Conjecture: what we left out:

The RUM Conjecture triangle has three regions, each occupied by different data structures:
Read-optimized: Hash indexes, B-Trees, Tries, Skiplists
Update-optimized: LSM Trees (that's us), PDTs, PBTs, Differential Structures, MaSM
Memory-optimized: Bloom filters, Bitmaps, Sparse/Approximate indexes
Center (all three): Adaptive Merging

We kept the triangle in the video but skipped these labels to stay focused - the original RUM Conjecture paper (Athanassoulis et al., 2016) has the full taxonomy if you want to go deeper.


** Sources referenced in this episode **

[1] Multiple production LSM-based engines - LevelDB, RocksDB, Pebble, Cassandra, ScyllaDB, TigerBeetle, TiKV, YugabyteDB, etc.
[2] Athanassoulis et al., "Designing Access Methods: The RUM Conjecture," EDBT 2016: https://openproceedings.org/2016/conf...
[3] Mark Callaghan, "Read, Write & Space Amplification - B-Tree vs LSM," Small Datum, 2015: http://smalldatum.blogspot.com/2015/1...
[4] Dayan & Idreos, "Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores," SIGMOD 2018: https://nivdayan.github.io/dostoevsky...
[5] Mark Callaghan, "Universal Compaction in RocksDB and Me," Small Datum, 2023: https://smalldatum.blogspot.com/2023/...
[5+] ScyllaDB, "Compaction Series: Leveled Compaction," 2018: https://www.scylladb.com/2018/01/31/c...
[6] Cockroach Labs, "Introducing Pebble," 2020: https://www.cockroachlabs.com/blog/pe...
[7] TiKV on RocksDB: https://tikv.org/deep-dive/key-value-...
[8] Karthik Ranganathan (YugabyteDB) interview: https://www.unite.ai/karthik-ranganat...
[9] Tiered compaction - Dostoevsky paper Section 2 (same as [4])
[10] Cassandra STCS (default compaction): https://cassandra.apache.org/doc/late...
[11] Bloom filter FPR at 10 bits/key - Dostoevsky paper Section 2 (same as [4])
[11+] RocksDB Bloom Filter wiki: https://github.com/facebook/rocksdb/w...
[12] 1B keys x 10 bits/key = ~1.2 GB - arithmetic
[13] LevelDB (Ghemawat & Dean, 2011): https://github.com/google/leveldb
[14] RocksDB (Facebook, 2013): https://engineering.fb.com/2013/11/21...
[15] "Even we as RocksDB developers don't fully understand..." - RocksDB Tuning Guide: https://github.com/facebook/rocksdb/w...
[16] Pebble (Cockroach Labs, 2020): https://www.cockroachlabs.com/blog/pe...
[17] Pebble omits ~15 RocksDB features: https://github.com/cockroachdb/pebble
[18] Cassandra at Instagram - F8 2018:   / cassandra-on-rocksdb-at-instagram  
[19] ScyllaDB - C++ rewrite of Cassandra: https://www.scylladb.com/product/tech...
[20] TigerBeetle - 128-byte records: https://docs.tigerbeetle.com/referenc...
[21] "A Trillion Transactions" (Joran Greef, TigerBeetle):    • A Trillion Transactions  
[22] Mark Callaghan, "Name That Compaction Algorithm," Small Datum, 2018: http://smalldatum.blogspot.com/2018/0...
[23] Kleppmann, Designing Data-Intensive Applications, Ch. 6 - "Partitioning is also known as sharding"
[24] AWS S3 Standard pricing (~$23/TB/month): https://aws.amazon.com/s3/pricing/
[25] Alternative object storage: Backblaze B2 (~$6/TB), Vultr (~$5/TB), Cloudflare R2 (~$15/TB)