Archive for the ‘ Uncategorized ’ Category

Roles Attributes: Embracing a Chef Anti-Pattern

Monday, August 25th, 2014

There is a fairly large foundation of concepts in Chef that new adopters need to wrap their heads around. And even once they have done that, it doesn’t become any easier to find the right methodology to use when building your infrastructure. One of the main ideas that we have embraced at SimpleReach is pervasive use of roles and role attributes. Using role attributes is considered a Chef anti-pattern. Which begs the question, if this is really a Chef anti-pattern, why are we doing it? (more…)

Recruiting Around a Series-A

Monday, August 18th, 2014

If you’ve ever heard the saying, “the second you think it’s time for HR, it’s already too late,” then you’ll know what I’m talking about in this post. SimpleReach, having recently raised a round of funding, is now heavily in to the recruiting process on all fronts. But there is a dichotomy that comes with getting to this point…once you are at the point where can hire, you should have been recruiting for a long time. (more…)

Batching Work For Efficiency and Tuning

Monday, June 23rd, 2014

I’ve been talking a lot about message systems in distributed architectures lately. And one of the slides I show in my talks is a slide about compressing messages before writes to the database. In other words, if you have 150k messages per second coming in which would translate 1:1 in writes and force your database(s) to incur a 150k write per second load, you pull in all those messages in to memory for a short period (say one minute) and group them and write the group in batch. Depending on how much you can group, you can easily cut your write load by an order of magnitude. (more…)

Range Repairs: Step-by-Step

Friday, May 9th, 2014

It’s been a long time since I was able to run a repair on my Cassandra cluster. Basically since I went to 1.2, it just hasn’t been possible. And since repairs in Cassandra are pretty much a requirement to normal operation, this is clearly a problem. So in order to deal with the disarray that is Cassandra repairs in 1.2, I found a script originally written by Matt Stump and edited to work with virtual nodes (vnodes) by Brian Gallew. The tl;dr is that the script breaks the repairs down into manageable chunks and allows the repairs to finish. It is available here. (more…)

Adding Features to NSQ

Monday, December 23rd, 2013

After being a fairly heavy user of NSQ over the past year or so and finding that it was missing a few features, I decided to jump in and try to add them myself. The only issue was that I didn’t know Go. Since something as simple as not knowing the language the application was written in has never stopped me before, I wasn’t going to let it stop me now.

Learning to Hardware Hack at RobotsConf

Monday, December 9th, 2013

I’ve been a programmer (if you can call me that) for quite a few years now. But for the most part, it’s really always been about designing software based systems. Even though these systems are larger than the average startup or SaaS company would get to work with, it’s (as I said) still about designing software based systems. Enter Robots Conf. (more…)

Adding Cross Zone Load Balancing in AWS

Tuesday, November 12th, 2013

One of the new hotness features that Amazon added to their Elastic Load Balancers is cross zone load balancing. This offers the ability to have an unbalanced number of nodes per availability zone within an Amazon region. For instance, if you were load balances across us-east-1a, us-east-1b, and us-east-1c, then you needed to have the same number of instances in each zone otherwise the traffic would skew and overload the zone with fewer instances. If you are auto-scaling, using spots, or just happen to lose instances from time to time, you can easily see where this becomes a problem. (more…)

Redis Setup Notes and One-Liners

Tuesday, October 22nd, 2013

Being a heavy user of Redis has forced some weird Bash-fu and other commands when I want to find out how things are going. Because Redis is single threaded (see here for more information), I commonly run multiple Redis instances per machine. As a result, when running on AWS, I use a specific machine layout to get the best CPU utilization for Redis. On an m2.4xlarge machine, it comes with 8 cores and 68G of RAM. To take full advantage of that I run 7 instances of Redis and pin one instance to a CPU core (this can be done using taskset in schedutils package). For extra performance, I leave an entire core to the OS (even though the machines do little other than process Redis commands. (more…)

Cassandra Summit 2012 Highlights

Monday, August 27th, 2012

I was lucky enough to have the opportunity to speak at the Cassandra World Summit 2012 on August 8 in Santa Clara. It was an amazing opportunity to share with the community the types of things that SimpleReach does with Cassandra. Not only that, I learned a lot about the roadmap and got to put a bunch of faces with the names behind the project.

Pros and Cons of Redis-Resque and SQS

Monday, July 30th, 2012

As with any system or application, there are upsides and downsides to using them. The two queueing systems that I want to explore are Resque and Amazon’s Simple Queuing Service. Resque is essentially a set of queuing APIs that run on Redis. Redis is an in-memory data store and is what actually handles the queues. It’s capable of handling complex data structures like lists (what Resque queues use), sets or sorted sets. Amazon’s SQS is an eventually consistent sharded messaging/queueing system.

Continuous Learning in Teams

Monday, July 23rd, 2012

One of the most important things in a startup is having a great culture. More often than not, the team will spend more time working and talking with each other than with their family or significant others (scary thought, but yes, it’s probably true). And I don’t want to harp on the importance of culture in a startup because people write tons of posts every day about it. What I do want to write about is what we do at SimpleReach to encourage culture.

Choosing a Product By Roadmap

Friday, November 18th, 2011

There are a lot of reasons to choose a specific technology. You can decide based on what skills you or the engineers around you have. You can decide on a new technology because it’s the right tool. But there are times when all other things are equal and the flip of a coin would suffice. And in my mind, that’s when it comes to choosing the right technology based on a roadmap.

Google Securing The Web One Discrete Monopolizing Push At A Time

Friday, November 4th, 2011

Contrary to speculation by some, Google’s decision for encrypting search data is motivated by the goal to make the web as a whole more secure and it’s not driven by economic interests. I think Google is silently forcing the internet to do what they should be doing on their own.

Exploring AppleScript with Alfred Shortcuts

Thursday, September 1st, 2011

If you have read my blog before, you’ll know that I am a big fan of Alfred (here). I love the shortcuts and the ability to make things quicker. One of the things I find myself doing quite frequently is looking for domains and their traffic counts on Alexa, Compete, and Quantcast. (more…)

Fixing CentOS Root Certificate Authority Issues

Wednesday, June 1st, 2011

While trying to clone a repository from Github the other day on one of my EC2 servers and I ran into an SSL verification issue. As it turns out, Github renewed their SSL certificate (as people who are responsible about their web presence do when their certificate is about to expire). As a result, I couldn’t git clone over https. This presents a problem since all my deploys work using git clone over https.

ec2-consistent-snapshot With Mongo

Thursday, April 21st, 2011

I setup MongoDB on my Amazon EC2 instance knowing full well that it would have to be backed up at some point. I also knew that by using XFS, I could take advantage of filesystem freezing in a similar fashion to LVM snapshots. I had remembered reading about backups on XFS with MySQL being done with ec2-consistent-snapshot. As with any piece of open source software, it just took a little tweaking to make it do what I wanted it to do.

5 Apps to Increase Mac Productivity

Tuesday, April 5th, 2011

I like to think I have been making the most of what’s available on my Mac. This means taking advantage of some obscure and some not so obscure apps. I want to go through some of those apps and a little about their usage to help others get some of the benefit I get. There are certainly other products available and even ones I use. The 5 apps I describe are the ones I use the most frequently (and recommend to just about everyone I come in contact with who uses a Mac).

Using Vi Mode Everywhere

Tuesday, March 15th, 2011

Not literally everywhere, but more places than usual. I have been looking for this solution for a long time and finally found it. Anyone who has ever worked around me knows that I do basically everything in Vi.

Common Pig One Liners

Tuesday, March 1st, 2011

As with any programming language, there is a bit of a learning curve with Pig. So here are a few common items that I found useful. If you know Pig, please feel free to add your own in the comments section.

Pig Queries Parsing JSON on Amazons Elastic Map Reduce Using S3 Data

Wednesday, February 23rd, 2011

I know the title of this post is a mouthful, but it’s the fun of pushing envelope of existing technologies. What I am looking to do is take my log data stored on S3 (which is in compressed JSON format) and run queries against it. In order to not have to learn everything about setting up Hadoop and still have the ability to leverage the power of Hadoop’s distributed data processing framework and not have to learn how to write map reduce jobs and … (this could go on for a while so I’ll just stop here). For all these reasons, I choose to use Amazon’s Elastic Map infrastructure and Pig.

Distributed Flume Setup With an S3 Sink

Friday, February 4th, 2011

I have recently spent a few days getting up to speed with Flume, Cloudera‘s distributed log offering. If you haven’t seen this and deal with lots of logs, you are definitely missing out on a fantastic project. I’m not going to spend time talking about it because you can read more about it in the users guide or in the Quora Flume Topic in ways that are better than I can describe it. But I will tell you about is my experience setting up Flume in a distributed environment to sync logs to an Amazon S3 sink.

As CTO of SimpleReach, a company that does most of it’s work in the cloud, I’m constantly strategizing on how we can take advantage of the cloud for auto-scaling. Depending on the time of day or how much content distribution we are dealing with, we will spawn new instances to accommodate the load. We will still need the logs from those machines for later analysis (batch jobs like making use of Elastic Map Reduce).

MySQL for Python

Monday, December 27th, 2010

I am always for using the right tool for the right job. A lot of time, that tool is Python. I have always had trouble finding solid documentation on using MySQL with Python. There was generally enough to get by, but the more the merrier. Enter MySQL for Python by Albert Lukaszewski. (more…)

Benchmarking in jRuby NYC.RB Talk

Wednesday, November 10th, 2010

Here are the slides from my presentation on jRuby during the NYC.rb talk on 11/9.

NYC.rb – Nov 2010 – Benchmarking jRuby

JSON Benchmarks in jRuby

Monday, October 25th, 2010

I am in the process of switching a major application from MRI Ruby (specifically 1.8.7-p302) using many C extensions to jRuby (currently trying 1.5.3-master). In my application, performance is extremely important. It is so important in fact, that I will be writing about some of my experiences in troubleshooting the speed and getting those important milliseconds back. When I am trying to keep an entire transaction from start to finish under 40ms and just the decoding of a JSON object into a Ruby object in jRuby takes roughly 30ms using json_pure, we may have to explore other avenues.

Hash Autovivification in Ruby

Wednesday, October 6th, 2010

One of the features that I miss most from my Perl days (and to be honest, there isn’t a whole lot I miss from my Perl days) is autovivification. For more information on what it is, read the wikipedia page on it here.

Interesting Object Methods in Ruby

Monday, September 27th, 2010

This little Rubyism is something that I use frequently for debugging my objects. I add a method to every object to show only the interesting methods. What do I mean by interesting methods?

Culture of Product vs. Culture of Code

Tuesday, September 21st, 2010

I was talking to Luke Melia of about various cultures of startup companies. We compared two ideas that we referred to as a culture of product and a culture of code. They similar in concept but require defining.

Counting Frequencies of Frequencies

Monday, August 16th, 2010

Lots of people forget about the usefulness of the core utilities (the tools available in Bash). I am even pretty guilty of it at times with such quick and easy things like Perl, Ruby, or Python that allow you to process items from the command line. However, they load up an entire interpreter. It is usually better to use the coreutils.

Getting a Random Record From a MongoDB Collection

Monday, August 9th, 2010

One of my issues with MongoDB is that, as of this writing, there is no way to retrieve a random record. In SQL, you can simply do something similar to “ORDER BY RAND()” (this varies depending on your flavor) and you can retrieve random records (at a slightly expensive query cost). There is not yet an equivalent in MongoDB because of its sequential access nature. There is a purely Javascript method in the MongoDB cookbook here. If you are really interested, I would also read the Jira ticket thread #533 on this issue.

Sharing a Screen Session

Friday, July 23rd, 2010

Anyone who has spent any time in a shell and has been cut off while working should know about screen. If not, then I recommend reading up on it (here or here). But I’m not here to tell you about screen as a general tool, I want to show you how to use it for screen sharing. I found a couple of forum posts and other scattered information, so here’s a little centralizing of information.