Support Vector Machine¶
PMLS provides a SVM solver on distributed system. SVM application can be found in strads/apps/svm_release/
. From this point on, all instructions will assume you are in strads/apps/svm_release/
.
After building the Strads system (as explained in the installation page), you may build the the SVM solver from strads/apps/svm_release/
by running
make
Test the app (on your local machine) by running
python svm.py
This will perform SVM on rcv1.binary sample data in ./input
. The estimated model weights can be found in ./output
.
Performance¶
Coming soon
Input data format¶
The SVM use the LIBSVM format:
y col:value col:value col:value col:value col:value
y col:value col:value col:value col:value
y col:value col:value col:value
A single line represents a sample that consists of y response values and non-zero entries with column indexes. col
is 1-indexed as like Matlab.
Output format¶
The output file of SVM looks something like this:
col value
col value
col value
col value
col value
...
Each row with column id and value represents a non-zero model-parameter.
Machine configuration¶
Program Options¶
The SVM is launched using a python script, e.g. svm.py.
machfile = ['./singlemach.vm']
# data setting
input = ['./input/rcv']
# degree of parallelism
set_size = [' 1 ']
prog = ['./bin/svm-dual ']
os.system(" mpirun -machinefile "+machfile[0]+" "+prog[0]+" --machfile "+machfile[0]+" -input "+inputfile[0]+" -max_iter 200 -C 1.0 "+" -parallels "+dparallel[0]+" ");
The basic options are:
inputfile
: Path to the design matrix file, which must be present/visible to all machines. We strongly recommend providing the full path name to the data file.max_iter
: maximum number of iterationsThe following options are available for advanced users, who wish to control the dynamic scheduling algorithm used in the linear solver:dparallel
: the number of parameters to schedule per iteration. Increasing this can improve performance, but only up to a point.C
: is a SVM penalty parameter, which should be larger than 0.