A Beginner's Guide to Pig Latin on Mac OS X 10.8
I had the chance to play around with Pig today and thought it was kindda fun. What Pig does is that it allows you to use SQL-like language to analyze large sets of data. The cool thing is that Pig's underlying infrastructure layer compiler produces sequences of Map-Reduce programs from SQL-like commands.
Let's get started!
Installing Pig and Getting Ready....
brew install pig
Now, let's see if pig works....:
You should see a spew of help text such as the following:
The above is the output of pig -help on my terminal.
Here's some gotchas that you need to take note of:
- You need to define your $JAVA_HOME variable. You can learn how to do that on this tutorial.
- Install homebrew. homebrew makes your life on a Mac way easier. You can learn how to install homebrew here.
Running some basic scripts
You can run pig in 2 modes: local and hadoop. You will learn how to run pig locally in this tutorial.
To run pig locally:
pig -x local
After running the above command, you will be brought into the grunt> command line tool. You should see the following:
Now, extract the contents of the downloaded contents, and change directory into the folder.
Run the following command:
pig -x local wordcount.pig
What this command means is that you are running pig in local mode and that you are executing wordcount.pig script.
Once the command has finished processing, you will see a wordcount folder. Change directory into the folder and open up part-r-00000 using your favorite text editor. You will see the results of the script wordcount.pig
Feel free to open up wordcount.pig and see what's going with the script.