Support Vector Machine (SVM): Introduction and Usage

Lets classify :)

SVM Test: Fill in the data links and submit it The program demonstrated in above web-application and also mentioned in presentation is SVM-light

I would give you a brief example on how to use it, it is amazingly simple and powerful :):

1. download and compile on Linux (I will assume that you are using Linux for giving the example commands) this can be done using following commands:

wget http://download.joachims.org/svm_multiclass/current/svm_multiclass.tar.gz tar xvzf svm_multiclass.tar.gz

make

Binaries will be generate inside the directory where you initiated the 'make' command.

2. format the input files (training and test files separately) I am not sure how much machine-learning you have been doing, but a short crash course is that you have a training data over which a model is build. This model is later tested on the test data. So for example, I have attached a training example from my gene-expression paper. This has 5 gene expression values and a class label '0 1' or '1 0'. Lets look at the first line of this file:

1124 298 1057 177 543 1 0

so the first five columns represents the normalised expression values for the five genes and last two columns represent a class label to which this pattern belongs to. Now this file needs to be converted to something the svm program can use. For this you can use the perl script I wrote ofs2svm.pl. So given an input file, say the top5_tr.txt for training and top5_te.txt for test and the knowledge that it is 2 class problem, one can covert the give files using command:

perl ofs2svm.pl top5_tr.txt 2

perl ofs2svm.pl top5_te.txt 2

this will generate 2 files 'top5_te.txt.svm.out' and 'top5_te.txt.svm.out' respectively.

3. building the model

Generally I use polynomial kernel with default error tolerance. For linear problem one can simply use

./svm_multiclass_learn -c 0.01 -t 1 -d 1 top5_tr.txt.svm.out model

and for non-linear following works, for example over xor.txt

./svm_multiclass_learn -c 0.01 -t 1 -d 2 xor.txt model

4. prediction now we have to use this model to predict the class labels in the given test examples, which can be done using command: ./svm_multiclass_classify top5_te.txt.svm.out model predictions

5. performance calculation if you look at the first column of the svm formatted test files, you will see the actual class label and the file 'predictions' gives you the predicted class for that example. So the first row of file 'top5_te.txt.svm.out' and 'predictions' are same, thus the model works well. The misclassified examples are in the rows 3,5,15,17,18,30 and 31. So overall performance of this model is about ~80%. Hope this gives you a little idea on how about using this SVM program.

Training Data (hyperlink)
Test Data (hyperlink)
Polynomial Kernel Parameter (d)