Common Pig One Liners

As with any programming language, there is a bit of a learning curve with Pig. So here are a few common items that I found useful. If you know Pig, please feel free to add your own in the comments section.
Read the rest of this entry »

Posted in Hadoop. Tags: , . No Comments »

Pig Queries Parsing JSON on Amazons Elastic Map Reduce Using S3 Data

I know the title of this post is a mouthful, but it’s the fun of pushing envelope of existing technologies. What I am looking to do is take my log data stored on S3 (which is in compressed JSON format) and run queries against it. In order to not have to learn everything about setting up Hadoop and still have the ability to leverage the power of Hadoop’s distributed data processing framework and not have to learn how to write map reduce jobs and … (this could go on for a while so I’ll just stop here). For all these reasons, I choose to use Amazon’s Elastic Map infrastructure and Pig.
Read the rest of this entry »

Posted in Hadoop. Tags: , , . 13 Comments »