A Beginner's Guide to Pig Latin on Mac OS X 10.8


I had the chance to play around with Pig today and thought it was kindda fun. What Pig does is that it allows you to use SQL-like language to analyze large sets of data. The cool thing is that Pig's underlying infrastructure layer compiler produces sequences of Map-Reduce programs from SQL-like commands.

Let's get started!

Installing Pig and Getting Ready....

brew install pig

Now, let's see if pig works....:

pig -help

You should see a spew of help text such as the following:

Screen Shot 2013-07-08 at 11.14.07 AM


The above is the output of pig -help on my terminal.

Here's some gotchas that you need to take note of:

  1.  You need to define your $JAVA_HOME variable. You can learn how to do that on this tutorial.
  2. Install homebrew. homebrew makes your life on a Mac way easier. You can learn how to install homebrew here.

Running some basic scripts

You can run pig in 2 modes: local and hadoop. You will learn how to run pig locally in this tutorial.

To run pig locally:

pig -x local

After running the above command, you will be brought into the grunt> command line tool. You should see the following:

Screen Shot 2013-07-08 at 11.20.17 AM


Next, download this. This is a tutorial i found that has a simple but good introduction to pig scripting.

Now, extract the contents of the downloaded contents, and change directory into the folder.

Run the following command:

pig -x local wordcount.pig

What this command means is that you are running pig in local mode and that you are executing wordcount.pig script.

Once the command has finished processing, you will see a wordcount folder. Change directory into the folder and open up part-r-00000 using your favorite text editor. You will see the results of the script wordcount.pig

Feel free to open up wordcount.pig and see what's going with the script.