Pros and Cons of Redis-Resque and SQS

As with any system or application, there are upsides and downsides to using them. The two queueing systems that I want to explore are Resque and Amazon’s Simple Queuing Service. Resque is essentially a set of queuing APIs that run on Redis. Redis is an in-memory data store and is what actually handles the queues. It’s capable of handling complex data structures like lists (what Resque queues use), sets or sorted sets. Amazon’s SQS is an eventually consistent sharded messaging/queueing system.

Both systems have their pros and cons. Resque has to be run locally (meaning within your environment). And because it’s native to your architecture, it can be incredibly fast in comparison. It’s durability comes into question where even though Redis allows you to dump your data to disk under varying circumstances (say once per second) or have a master/slave architecture, ultimately you are still bound by the potential loss of a single machine (aka a single point of failure candidate). While this may only be the case until Redis Cluster is released, comparisons have been made with the tools at hand. With SQS, it is much more durable. They also have the notion of in-flight messages. This means that the message is pulled off the queue but never deleted until the delete command is sent for that message id. So if you lose your worker mid-processing of the event, that event isn’t lost for good. The message will be timed out after being in-flight for 5 minutes and then dropped back onto the available queue. While this functionality could be written into Resque, it just wasn’t part of the fundamental design.

For the sake of terminology, a job in Resque is a message in SQS. A big issue with using SQS is that if you need more than 120k allowed messages to be in-flight at a time, you have to request an increase. As of the publishing of this post, it has to be a special request made through an account manager or through support. This is easy and nothing other than a potential inconvenience worth noting. Another feature that has the potential to put a kink in your use/case is the fact that you can only pull off 10 messages at a time. In order words, 10 is the max batch size per receive request. So if you regularly put 1 million messages on a queue, then pulling off 10 messages off at a time would take 100k jobs. A potential deal breaker is that you can’t yet flush a queue in SQS. So if your queues are a little backed up?? and you want to start over, the only choice you have is to delete your queue (which will remove any custom settings that the SQS team may have made).

Another thing that most people tend to overlook is the cost of running each system. Let’s assume that we are running everything in AWS US East. The actual machines running Redis/Resque don’t need a lot of horsepower or a lot of memory. So for arguments sake, you can run the master/slave combo for Redis on two medium instances which amounts to roughly $250 per month. The thing you need to remember about Redis is that is can only use one core. Note here in the documentation (http://redis.io/topics/benchmarks):

Redis is a single-threaded server. It is not designed to benefit from multiple CPU cores. People are supposed to launch several Redis instances to scale out on several cores if needed. It is not really fair to compare one single Redis instance to a multi-threaded data store.

So if you plan on using Redis for something else in addition to Resque (i.e. as a caching system), I recommend another EC2 configuration with Redis instance counts equal to the number of cores on the machine (minus one for of the OS). Right now SQS charges $0.120 per GB data transfer out and $0.01 per 10,000 Amazon SQS Requests ($0.000001 per Request). Assuming we handling 10 million messages per day (which is 300 million messages per month) at roughly 1k per message (or 300G transfer per month). In SQS this would cost about $350.

So right off the bat on a price comparison, Redis wins if you are only looking at cost. However, that also means building and administering 2 Redis servers which also comes at a cost. But you may be paying even more than the $350 dollars per month if only pulling 10 messages at a time per batch requires you spin up more workers. Something else that may be an additional cost is that using SQS may require you to use SNS (Amazon’s Simple Notification Service). If you aren’t using SNS and don’t know how, then monitoring your queues might have an additional learning curve associated with it (as opposed to one of many Nagios scripts that will check Resque queue sizes).

As with anything else, your mileage may vary. So learn your use/case and what compromises you may need to make when going with either one system or the other.

Note: This post is intended to be a comparison based on recent experiences. Amazon’s SQS team has been fantastic throughout the (investigation) process in discussing their roadmap (when possible) and giving the account special dispensation where possible.