Liblinear algorithms in C#

The Accord.NET Framework is not only an image processing and computer vision framework, but also a machine learning framework for .NET. One of its features is to encompass the exact same algorithms that can be found in other libraries, such as LIBLINEAR, but offer them in .NET withing a common interface ready to be incorporated in your application.

 

What is LIBLINEAR?

As its authors put, LIBLINEAR is a library for large linear classification. It is intended to be used to tackle classification and regression problems with millions of instances and features, although it can only produce linear classifiers, i.e. linear support vector machines.

The framework now offers almost all liblinear algorithms in C#, except for one. Those include:

As it can be seen, the command line option 4 is missing. Mode #4 refers to the Crammer and Singer’s formulation for multi-class classification. However, the framework already provides different ways to obtain both multi-class as well as multi-label classifiers through both Voting and Directed Acyclic Graphs (DDAG) mechanisms.

Additionally, the framework also offers:

The framework can equally load data and load and save support vector machines using the LIBSVM format. This means it should be straighforward to create or learn your models using one tool and run it on the other, if that would be necessary. For example, given that Accord.NET can run on mobile applications, it is possible to create and learn your models in a computing grid using liblinear and then integrate it in your Windows Phone application by loading it in Accord.NET.

The advantages are that:

When studying and porting the algorithms, I have also set up a liblinear GitHub repository page to track changes between versions. I hope this repository can also be helpful for other people willing to track modifications done the the liblinear project.

Example

Learning linearly separable problems with a linear machine

In the following example we will create a linear machine to learn a simple linearly separable binary AND problem.

The binary AND classification problem. As it can be seen, the AND classification problem is a linearly separable problem, as it is possible to draw a line separating the blue points from the red points with ease.

The linear SVM’s answer to the linearly separable AND problem. As it can be seen, a linear SVM can correctly predict the colors for each of the points in the original problem. This happens because the SVM learning algorithm is able to find the line that separates the blue points from the red points.

Learning non-linearly separable problems with a linear machine through kernel expansions

Now, we will move a bit further. We will use an explicit kernel expansion to learn the non-linearly separable exclusive or (XOR) problem.

The binary XOR problem. The XOR problem is not a linerly separable problem, because it is not possible to draw a line separating the blue points from the red points. In this setting, we should expect a linear SVM learning algorithm to fail, because it will not be able to find this line that doesn’t exist.

As we can see, the linear SVM failed to predict the correct colors for each of the points. The problem is that the answers from a linear SVM are constrained to be hyperplanes (in this 2D case, lines)  that separate the points. Because there is no line that separates the blue points from the red points in the XOR problem, the linear SVM learning algorithm tries its best, finding an approximate solution, but not the XOR solution we were looking for.

By using an explicit kernel expansion, we can use a linear SVM to learn a non-linearly separable problem. This is possible because the kernel transformation projects the data into a higher dimensionality space where the data is indeed linearly separable. For an intuition on how this could be possible, please check the blog post kernel functions for machine learning applications.

Learning liblinear problems from libsvm format

Now, we move even further, and use a linear machine to load one of the toy LIBLINEAR problems available in LibSVM format using the framework’s SparseReader class.

 

The confusion matrix for the binary classification problem. As it can be seen, the higher values concentrate in the diagonal. Those values indicate how many hits (correct guesses) the machine was able to make. The other values that don’t lie in the diagonal indicate how many errors (and of what kind) the machine made in this classification problem.

The last version of this tutorial can also be seen on the project’s wiki pages for linear machines.

References

  • R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A Library for Large Linear Classification, Journal of Machine Learning Research 9(2008), 1871-1874. Software available at http://www.csie.ntu.edu.tw/~cjlin/liblinear

Leave a Reply

Your email address will not be published. Required fields are marked *