Archive for 2011

Choosing a Product By Roadmap

Friday, November 18th, 2011

There are a lot of reasons to choose a specific technology. You can decide based on what skills you or the engineers around you have. You can decide on a new technology because it’s the right tool. But there are times when all other things are equal and the flip of a coin would suffice. And in my mind, that’s when it comes to choosing the right technology based on a roadmap.

Google Securing The Web One Discrete Monopolizing Push At A Time

Friday, November 4th, 2011

Contrary to speculation by some, Google’s decision for encrypting search data is motivated by the goal to make the web as a whole more secure and it’s not driven by economic interests. I think Google is silently forcing the internet to do what they should be doing on their own.

Exploring AppleScript with Alfred Shortcuts

Thursday, September 1st, 2011

If you have read my blog before, you’ll know that I am a big fan of Alfred (here). I love the shortcuts and the ability to make things quicker. One of the things I find myself doing quite frequently is looking for domains and their traffic counts on Alexa, Compete, and Quantcast. (more…)

Fixing CentOS Root Certificate Authority Issues

Wednesday, June 1st, 2011

While trying to clone a repository from Github the other day on one of my EC2 servers and I ran into an SSL verification issue. As it turns out, Github renewed their SSL certificate (as people who are responsible about their web presence do when their certificate is about to expire). As a result, I couldn’t git clone over https. This presents a problem since all my deploys work using git clone over https.

ec2-consistent-snapshot With Mongo

Thursday, April 21st, 2011

I setup MongoDB on my Amazon EC2 instance knowing full well that it would have to be backed up at some point. I also knew that by using XFS, I could take advantage of filesystem freezing in a similar fashion to LVM snapshots. I had remembered reading about backups on XFS with MySQL being done with ec2-consistent-snapshot. As with any piece of open source software, it just took a little tweaking to make it do what I wanted it to do.

5 Apps to Increase Mac Productivity

Tuesday, April 5th, 2011

I like to think I have been making the most of what’s available on my Mac. This means taking advantage of some obscure and some not so obscure apps. I want to go through some of those apps and a little about their usage to help others get some of the benefit I get. There are certainly other products available and even ones I use. The 5 apps I describe are the ones I use the most frequently (and recommend to just about everyone I come in contact with who uses a Mac).

Using Vi Mode Everywhere

Tuesday, March 15th, 2011

Not literally everywhere, but more places than usual. I have been looking for this solution for a long time and finally found it. Anyone who has ever worked around me knows that I do basically everything in Vi.

Common Pig One Liners

Tuesday, March 1st, 2011

As with any programming language, there is a bit of a learning curve with Pig. So here are a few common items that I found useful. If you know Pig, please feel free to add your own in the comments section.

Pig Queries Parsing JSON on Amazons Elastic Map Reduce Using S3 Data

Wednesday, February 23rd, 2011

I know the title of this post is a mouthful, but it’s the fun of pushing envelope of existing technologies. What I am looking to do is take my log data stored on S3 (which is in compressed JSON format) and run queries against it. In order to not have to learn everything about setting up Hadoop and still have the ability to leverage the power of Hadoop’s distributed data processing framework and not have to learn how to write map reduce jobs and … (this could go on for a while so I’ll just stop here). For all these reasons, I choose to use Amazon’s Elastic Map infrastructure and Pig.

Distributed Flume Setup With an S3 Sink

Friday, February 4th, 2011

I have recently spent a few days getting up to speed with Flume, Cloudera‘s distributed log offering. If you haven’t seen this and deal with lots of logs, you are definitely missing out on a fantastic project. I’m not going to spend time talking about it because you can read more about it in the users guide or in the Quora Flume Topic in ways that are better than I can describe it. But I will tell you about is my experience setting up Flume in a distributed environment to sync logs to an Amazon S3 sink.

As CTO of SimpleReach, a company that does most of it’s work in the cloud, I’m constantly strategizing on how we can take advantage of the cloud for auto-scaling. Depending on the time of day or how much content distribution we are dealing with, we will spawn new instances to accommodate the load. We will still need the logs from those machines for later analysis (batch jobs like making use of Elastic Map Reduce).