A beginner's guide to PyML

Hi all, today I'm back with a simple and short tutorial on using PyML.

For a start, PyML is a machine learning library written in Python. It's fairly straightforward and easy to use. And I found it useful for my daily machine learning needs.

To put it in simplest terms, machine learning generally can be divided into supervised and unsupervised learning. Supervised learning is where we have the 'correct' answers, whereas no correct answer is available for unsupervised learning.

I'm currently focused on supervised learning, and hence this short tutorial is for supervised learning.

Firstly, you need to install PyML, and you can get it from here [ http://sourceforge.net/projects/pyml/ ]

Download the source files, untar or unzip, change directory into the folder and run the command:

sudo python setup.py install

*Be sure to use PyML version 7.9 as this tutorial is based on version 7.9

 

Next, you can start using PyML by importing the library and its components:

from PyML import *
# this imports the most frequently used components if PyML

 

We can create training data by using a list of lists. Take for example:


X = [[1,2,3,4,5],[2,3,4,5,6],[3,4,5,6,7]]
# where each of the list represents an arbitrary vector

 

Next, we use the VectorDataSet object:

data = VectorDataSet(X,L=['True','False','True'])

Note that we give the L argument a list of labels, a.k.a the correct answer to the list of lists ( for X ). Most importantly, there must be 1 label for every vector.

Now that we have the data prepared, let's start training the data!

Create a SVM() object, than train the data:


s = SVM()
s.train(data)

 

We can save the data should we want to:

s.save(fileName)

 

So now we have completed the preparation work, its time to test the classifier.

We start off by loading the model which we have saved:


from PyML.classifiers.svm import loadSVM
loadedSVM = loadSVM(fileName,data)

 

We than create a test data and test the classifier:


testX = [[...],[...]] # some arbitrary test data
testL = ['True','False'... ] # some arbitrary labels for the test data
testData = VectorDataSet(testX,L=testL)
r = loadedSVM.test(testData)
print r # this shows the results

 

 

That's all folks! Hope this helps!