Common Pig One Liners

01
Mar

As with any programming language, there is a bit of a learning curve with Pig. So here are a few common items that I found useful. If you know Pig, please feel free to add your own in the comments section.

When it comes to Pig, there is a “filter early, filter often” approach that is preached and practiced. So some of these may be more than one line, but either way, they are short. These have all been tested only on Pig 0.6 on Amazon’s Elastic Map Reduce version of Pig. Since they are simple, they should be fairly portable. As one would expect, these are contrived examples.

  • Count all the items in a bucket. The SQL equivalent being: SELECT COUNT(*) FROM foo.

    GeSHi Error: GeSHi could not find the language pig (using path /nas/content/live/elubow/wp-content/plugins/codecolorer/lib/geshi/) (code 2)
  • Grouping on multiple elements in a bag. Assuming you have a bag with 4 tuples that looks like this: (1,Football),(2,Soccer),(1,Soccer),(2,Soccer). You may want to know how many of user type 1 are “Football” or “Soccer” and how many of user type 2 are “Football” or “Soccer”. Note: If you want user_type and sport in a separate bag, just remove the FLATTEN($0).

    GeSHi Error: GeSHi could not find the language pig (using path /nas/content/live/elubow/wp-content/plugins/codecolorer/lib/geshi/) (code 2)
  • Add a field to a every element in a bag. From my understanding, this next bit is a Pig 0.6ism. This will join each by 1 thus creating a tuple with an implicit join of 1. The outcome will be a similar effect to an array push of a field onto the end of every tuple in a bag.

    GeSHi Error: GeSHi could not find the language pig (using path /nas/content/live/elubow/wp-content/plugins/codecolorer/lib/geshi/) (code 2)
  • Let’s take field that we added to the end of the tuple and get a percentage out of it. This will return the total out of 100%.

    GeSHi Error: GeSHi could not find the language pig (using path /nas/content/live/elubow/wp-content/plugins/codecolorer/lib/geshi/) (code 2)

This post is another example of work that I could not have accomplished without the help of people on #hadoop-pig on irc.freenode.net. Also worthy of note are the Pig Latin manuals here and here.

Leave a Reply