Scaling Cassandra and MySQL

Опубликовано: 16 Январь 2026
на канале: @Scale
3,811
13

Featuring: Stefan Piesche, CTO at Constant Contact
Featuring: CTCT used to scale data vertically in large DB2 databases attached to even larger SANs. Since this is not only cost prohibitive but poses significant scalability and availability issues, we have now 2 primary other data strategies.

Cassandra. We use Cassandra as a horizontally scalable data tier for key/value type data. We have around 350 Cassandra nodes spanning 2 data centers. That systems provides 10x the performance of the old RDBMS and 1/10th of the cost. This system is our consumer event tracking systems that scales to 100TB of data, 150BN records that arrive at a velocity of 10k/sec.

Sharded mysql. Our largest deploy is a 36TB system spanning 2 data centers. But, instead of just sharding the DB tier, we even shard the application tier using that system in order to provide complete transparency of the sharding mechanism. Our SOA allows for RESTful access of that data, without any knowledge of the underlying sharding mechanism. However, we have learned that this led to a substantial underutilization of the app tiers – a 96 node cluster of a Ruby Rails application – so we are looking into proprietary DB level sharding mechanisms as well.

The mixture of RDMBS and NOSQL data tiers has caused issues  in our analytics platform, a 150TB Hadoop cluster.  We use  similar mechanism like Netflix does to read data from Cassandra nodes – reading from the SSTables to extract the data.