Roles Attributes: Embracing a Chef Anti-Pattern

25
Aug

There is a fairly large foundation of concepts in Chef that new adopters need to wrap their heads around. And even once they have done that, it doesn’t become any easier to find the right methodology to use when building your infrastructure. One of the main ideas that we have embraced at SimpleReach is pervasive use of roles and role attributes. Using role attributes is considered a Chef anti-pattern. Which begs the question, if this is really a Chef anti-pattern, why are we doing it?

Recruiting Around a Series-A

18
Aug

If you’ve ever heard the saying, “the second you think it’s time for HR, it’s already too late,” then you’ll know what I’m talking about in this post. SimpleReach, having recently raised a round of funding, is now heavily in to the recruiting process on all fronts. But there is a dichotomy that comes with getting to this point…once you are at the point where can hire, you should have been recruiting for a long time.

Batching Work For Efficiency and Tuning

23
Jun

I’ve been talking a lot about message systems in distributed architectures lately. And one of the slides I show in my talks is a slide about compressing messages before writes to the database. In other words, if you have 150k messages per second coming in which would translate 1:1 in writes and force your database(s) to incur a 150k write per second load, you pull in all those messages in to memory for a short period (say one minute) and group them and write the group in batch. Depending on how much you can group, you can easily cut your write load by an order of magnitude.

Range Repairs: Step-by-Step

09
May

It’s been a long time since I was able to run a repair on my Cassandra cluster. Basically since I went to 1.2, it just hasn’t been possible. And since repairs in Cassandra are pretty much a requirement to normal operation, this is clearly a problem. So in order to deal with the disarray that is Cassandra repairs in 1.2, I found a script originally written by Matt Stump and edited to work with virtual nodes (vnodes) by Brian Gallew. The tl;dr is that the script breaks the repairs down into manageable chunks and allows the repairs to finish. It is available here.

Adding Features to NSQ

23
Dec

After being a fairly heavy user of NSQ over the past year or so and finding that it was missing a few features, I decided to jump in and try to add them myself. The only issue was that I didn’t know Go. Since something as simple as not knowing the language the application was written in has never stopped me before, I wasn’t going to let it stop me now.

Learning to Hardware Hack at RobotsConf

09
Dec

I’ve been a programmer (if you can call me that) for quite a few years now. But for the most part, it’s really always been about designing software based systems. Even though these systems are larger than the average startup or SaaS company would get to work with, it’s (as I said) still about designing software based systems. Enter Robots Conf.

Adding Cross Zone Load Balancing in AWS

12
Nov

One of the new hotness features that Amazon added to their Elastic Load Balancers is cross zone load balancing. This offers the ability to have an unbalanced number of nodes per availability zone within an Amazon region. For instance, if you were load balances across us-east-1a, us-east-1b, and us-east-1c, then you needed to have the same number of instances in each zone otherwise the traffic would skew and overload the zone with fewer instances. If you are auto-scaling, using spots, or just happen to lose instances from time to time, you can easily see where this becomes a problem.

Redis Setup Notes and One-Liners

22
Oct

Being a heavy user of Redis has forced some weird Bash-fu and other commands when I want to find out how things are going. Because Redis is single threaded (see here for more information), I commonly run multiple Redis instances per machine. As a result, when running on AWS, I use a specific machine layout to get the best CPU utilization for Redis. On an m2.4xlarge machine, it comes with 8 cores and 68G of RAM. To take full advantage of that I run 7 instances of Redis and pin one instance to a CPU core (this can be done using taskset in schedutils package). For extra performance, I leave an entire core to the OS (even though the machines do little other than process Redis commands.

Cassandra Summit 2012 Highlights

27
Aug

I was lucky enough to have the opportunity to speak at the Cassandra World Summit 2012 on August 8 in Santa Clara. It was an amazing opportunity to share with the community the types of things that SimpleReach does with Cassandra. Not only that, I learned a lot about the roadmap and got to put a bunch of faces with the names behind the project.

What’s So Great About Cassandra’s Composite Columns?

07
Aug

There are a lot of things I really like about Cassandra. But one thing in particular I like in creating a schema is having access to composite columns (read about composite columns and their origins here on Datastax’s blog). Let’s start simple with explaining a composite columns and then we can dive right into why they are so much fun to work with.

Pros and Cons of Redis-Resque and SQS

30
Jul

As with any system or application, there are upsides and downsides to using them. The two queueing systems that I want to explore are Resque and Amazon’s Simple Queuing Service. Resque is essentially a set of queuing APIs that run on Redis. Redis is an in-memory data store and is what actually handles the queues. It’s capable of handling complex data structures like lists (what Resque queues use), sets or sorted sets. Amazon’s SQS is an eventually consistent sharded messaging/queueing system.

Continuous Learning in Teams

23
Jul

One of the most important things in a startup is having a great culture. More often than not, the team will spend more time working and talking with each other than with their family or significant others (scary thought, but yes, it’s probably true). And I don’t want to harp on the importance of culture in a startup because people write tons of posts every day about it. What I do want to write about is what we do at SimpleReach to encourage culture.

Choosing a Product By Roadmap

18
Nov

There are a lot of reasons to choose a specific technology. You can decide based on what skills you or the engineers around you have. You can decide on a new technology because it’s the right tool. But there are times when all other things are equal and the flip of a coin would suffice. And in my mind, that’s when it comes to choosing the right technology based on a roadmap.

Google Securing The Web One Discrete Monopolizing Push At A Time

04
Nov

Contrary to speculation by some, Google’s decision for encrypting search data is motivated by the goal to make the web as a whole more secure and it’s not driven by economic interests. I think Google is silently forcing the internet to do what they should be doing on their own.

Exploring AppleScript with Alfred Shortcuts

01
Sep

If you have read my blog before, you’ll know that I am a big fan of Alfred (here). I love the shortcuts and the ability to make things quicker. One of the things I find myself doing quite frequently is looking for domains and their traffic counts on Alexa, Compete, and Quantcast.

Fixing CentOS Root Certificate Authority Issues

01
Jun

While trying to clone a repository from Github the other day on one of my EC2 servers and I ran into an SSL verification issue. As it turns out, Github renewed their SSL certificate (as people who are responsible about their web presence do when their certificate is about to expire). As a result, I couldn’t git clone over https. This presents a problem since all my deploys work using git clone over https.

ec2-consistent-snapshot With Mongo

21
Apr

I setup MongoDB on my Amazon EC2 instance knowing full well that it would have to be backed up at some point. I also knew that by using XFS, I could take advantage of filesystem freezing in a similar fashion to LVM snapshots. I had remembered reading about backups on XFS with MySQL being done with ec2-consistent-snapshot. As with any piece of open source software, it just took a little tweaking to make it do what I wanted it to do.

5 Apps to Increase Mac Productivity

05
Apr

I like to think I have been making the most of what’s available on my Mac. This means taking advantage of some obscure and some not so obscure apps. I want to go through some of those apps and a little about their usage to help others get some of the benefit I get. There are certainly other products available and even ones I use. The 5 apps I describe are the ones I use the most frequently (and recommend to just about everyone I come in contact with who uses a Mac).

Using Vi Mode Everywhere

15
Mar

Not literally everywhere, but more places than usual. I have been looking for this solution for a long time and finally found it. Anyone who has ever worked around me knows that I do basically everything in Vi.

Common Pig One Liners

01
Mar

As with any programming language, there is a bit of a learning curve with Pig. So here are a few common items that I found useful. If you know Pig, please feel free to add your own in the comments section.

Pig Queries Parsing JSON on Amazons Elastic Map Reduce Using S3 Data

23
Feb

I know the title of this post is a mouthful, but it’s the fun of pushing envelope of existing technologies. What I am looking to do is take my log data stored on S3 (which is in compressed JSON format) and run queries against it. In order to not have to learn everything about setting up Hadoop and still have the ability to leverage the power of Hadoop’s distributed data processing framework and not have to learn how to write map reduce jobs and … (this could go on for a while so I’ll just stop here). For all these reasons, I choose to use Amazon’s Elastic Map infrastructure and Pig.

Distributed Flume Setup With an S3 Sink

04
Feb

I have recently spent a few days getting up to speed with Flume, Cloudera‘s distributed log offering. If you haven’t seen this and deal with lots of logs, you are definitely missing out on a fantastic project. I’m not going to spend time talking about it because you can read more about it in the users guide or in the Quora Flume Topic in ways that are better than I can describe it. But I will tell you about is my experience setting up Flume in a distributed environment to sync logs to an Amazon S3 sink.

As CTO of SimpleReach, a company that does most of it’s work in the cloud, I’m constantly strategizing on how we can take advantage of the cloud for auto-scaling. Depending on the time of day or how much content distribution we are dealing with, we will spawn new instances to accommodate the load. We will still need the logs from those machines for later analysis (batch jobs like making use of Elastic Map Reduce).

MySQL for Python

27
Dec

I am always for using the right tool for the right job. A lot of time, that tool is Python. I have always had trouble finding solid documentation on using MySQL with Python. There was generally enough to get by, but the more the merrier. Enter MySQL for Python by Albert Lukaszewski.

Benchmarking in jRuby NYC.RB Talk

10
Nov

JSON Benchmarks in jRuby

25
Oct

I am in the process of switching a major application from MRI Ruby (specifically 1.8.7-p302) using many C extensions to jRuby (currently trying 1.5.3-master). In my application, performance is extremely important. It is so important in fact, that I will be writing about some of my experiences in troubleshooting the speed and getting those important milliseconds back. When I am trying to keep an entire transaction from start to finish under 40ms and just the decoding of a JSON object into a Ruby object in jRuby takes roughly 30ms using json_pure, we may have to explore other avenues.

Hash Autovivification in Ruby

06
Oct

One of the features that I miss most from my Perl days (and to be honest, there isn’t a whole lot I miss from my Perl days) is autovivification. For more information on what it is, read the wikipedia page on it here.

Interesting Object Methods in Ruby

27
Sep

This little Rubyism is something that I use frequently for debugging my objects. I add a method to every object to show only the interesting methods. What do I mean by interesting methods?

Culture of Product vs. Culture of Code

21
Sep

I was talking to Luke Melia of Weplay.com about various cultures of startup companies. We compared two ideas that we referred to as a culture of product and a culture of code. They similar in concept but require defining.

Counting Frequencies of Frequencies

16
Aug

Lots of people forget about the usefulness of the core utilities (the tools available in Bash). I am even pretty guilty of it at times with such quick and easy things like Perl, Ruby, or Python that allow you to process items from the command line. However, they load up an entire interpreter. It is usually better to use the coreutils.

Getting a Random Record From a MongoDB Collection

09
Aug

One of my issues with MongoDB is that, as of this writing, there is no way to retrieve a random record. In SQL, you can simply do something similar to “ORDER BY RAND()” (this varies depending on your flavor) and you can retrieve random records (at a slightly expensive query cost). There is not yet an equivalent in MongoDB because of its sequential access nature. There is a purely Javascript method in the MongoDB cookbook here. If you are really interested, I would also read the Jira ticket thread #533 on this issue.