Batching Work For Efficiency and Tuning

I’ve been talking a lot about message systems in distributed architectures lately. And one of the slides I show in my talks is a slide about compressing messages before writes to the database. In other words, if you have 150k messages per second coming in which would translate 1:1 in writes and force your database(s) to incur a 150k write per second load, you pull in all those messages in to memory for a short period (say one minute) and group them and write the group in batch. Depending on how much you can group, you can easily cut your write load by an order of magnitude. Read the rest of this entry »

Range Repairs: Step-by-Step

It’s been a long time since I was able to run a repair on my Cassandra cluster. Basically since I went to 1.2, it just hasn’t been possible. And since repairs in Cassandra are pretty much a requirement to normal operation, this is clearly a problem. So in order to deal with the disarray that is Cassandra repairs in 1.2, I found a script originally written by Matt Stump and edited to work with virtual nodes (vnodes) by Brian Gallew. The tl;dr is that the script breaks the repairs down into manageable chunks and allows the repairs to finish. It is available here. Read the rest of this entry »

Redis Setup Notes and One-Liners

Being a heavy user of Redis has forced some weird Bash-fu and other commands when I want to find out how things are going. Because Redis is single threaded (see here for more information), I commonly run multiple Redis instances per machine. As a result, when running on AWS, I use a specific machine layout to get the best CPU utilization for Redis. On an m2.4xlarge machine, it comes with 8 cores and 68G of RAM. To take full advantage of that I run 7 instances of Redis and pin one instance to a CPU core (this can be done using taskset in schedutils package). For extra performance, I leave an entire core to the OS (even though the machines do little other than process Redis commands. Read the rest of this entry »

Cassandra Summit 2012 Highlights

I was lucky enough to have the opportunity to speak at the Cassandra World Summit 2012 on August 8 in Santa Clara. It was an amazing opportunity to share with the community the types of things that SimpleReach does with Cassandra. Not only that, I learned a lot about the roadmap and got to put a bunch of faces with the names behind the project.
Read the rest of this entry »