Liblinear algorithms in C#

The Accord.NET Framework is not only an image processing and computer vision framework, but also a machine learning framework for .NET. One of its features is to encompass the exact same algorithms that can be found in other libraries, such as LIBLINEAR, but offer them in .NET withing a common interface ready to be incorporated in your application.

 

What is LIBLINEAR?

As its authors put, LIBLINEAR is a library for large linear classification. It is intended to be used to tackle classification and regression problems with millions of instances and features, although it can only produce linear classifiers, i.e. linear support vector machines.

The framework now offers almost all liblinear algorithms in C#, except for one. Those include:

Continue reading →

Sequence Classifiers in C#: Hidden Conditional Random Fields

After a preliminary article on hidden Markov models, some months ago I had finally posted the article on Hidden Conditional Random Fields (HCRF) on CodeProject. The HCRF is a discriminative model, forming the generative-discriminative pair with the hidden Markov model classifers.

This CodeProject article is a second on a series of articles about sequence classification, the first being about Hidden Markov Models. I’ve used this opportunity to write a little about generative versus discriminative models, and also provide a brief discussion on how Vapnik’s ideas apply to these learning paradigms.

All the code available on those articles are also available within the Accord.NET Framework. Those articles provide good examples on how to use the framework and can be regarded as a practical implementation on how to use those models with the framework.

Complete framework documentation can be found live at the project’s website, as well as in the framework’s GitHub wiki. The framework has now been referred on 30+ publications over the years, and several more are already in the works, by me and users around the world.

Academical publications

Talking about publications, the framework has been used within my own research on Computer Vision. If you need help in understanding the inner workings of the HCRF, a more visual explanation on the HCRF derivation can also be found at the presentation I gave on Iberamia 2012 about Fingerspelling Recognition with Support Vector Machines and Hidden Conditional Random Fields.

An application to a more interesting problem, namely natural words drawn from Sign Languages using a Microsoft Kinect, has also been accepted for publication at the 9th International Conference on Machine Learning and Data Mining, MLDM 2013, and will be available soon. Update: it is available at

As usual, hope you find it interesting!

Sequence Classifiers in C#: Hidden Markov Models

sample-5

Few days ago I published a new article on CodeProject, about classifiers based on banks of hidden Markov models to accomplish sequence classification.

While I have written about this subject in the past on this very blog, this time the article deals with Gaussian, continuous-density hidden Markov models rather than discrete ones. Plus, at this time the Accord.NET Framework has evolved much since that 2010 post, and the new article reflects most of the improvements and additions in those last two years.

In the meantime, this article is also serving as a hook to a future article, an article about Hidden Conditional Random Fields (HCRFs). The HCRF models can serve the same purpose as the HMMs but can be generalized to arbitrary graph structures and be trained discriminatively, which could be an advantage on classification tasks.

As always, I hope readers can find it a good read 🙂

Deep Learning Artificial Neural Networks: Speech Recognition and Universal Translators

Happy new year everyone!

With the beginning of this year, I would like to share a video I wish I had found earlier. It is about the recent breakthrough given by Deep Neural Networks in the field of speech recognition – which, despite I had known was a breakthrough, I didn’t know it was already leading to such surprising great results.

 

Deep neural networks are also available in the Accord.NET Framework. However, they’ve been a very recent addition – if you find any issues, bugs, or just wish to collaborate on development, please let me know!

Haar Feature Face Detection in C#

cp

A new article has been published in CodeProject! This article details the Viola-Jones face detection algorithm available in the Accord.NET Framework. The article page also provides a standalone package for face detection which can be reused without instantiating the entire framework.

 

The article is also participating in CodeProject’s monthly competition under the categories Best C# Article and Best Overall Article of the month! Feel free to cast your vote if you liked and otherwise found them useful!  
This month we have other great articles participating in the competition: Marcelo de Oliveira‘s Social News excels in the Web category; Roy‘s Inline MSIL in C# also presents a very interesting reading in the C# category. The later may eventually be extremely useful to leverage performance in managed applications. 
For those who don’t know, CodeProject is a amazingly useful site which publishes user-created articles and news. Every month, the best articles among all submissions are selected to win prizes and gain popularity as well!

Quadratic Programming in C#

I have manually translated and adapted the QuadProg solver for quadratic programming problems made by Berwin A. Turlach. His code was originally published under the GNU Library License, which has now been superseded by the GNU Lesser License. This adapted version honors the original work and is thus distributed under the same license.

The code in this article is part of the Accord.NET Framework. Download the full framework through NuGet and use this code in your own applications.

Introduction

Despite the name, the terms linear or quadratic programming have little resemblance to the set of activities most people now know as programming. Those terms usually usually refers to a specific set of function optimization methods, i.e. methods which can be used to determine the maximum or minimum points of special kinds of functions under a given number of solution constraints. For example, suppose we would like to determine the minimum value of the function:

f(x, y) = 2x + y + 4

Under the constraints that x and y must be non-negative (i.e. either positive or zero). This may seem fairly simple and trivial, but remember that practical linear programming problems may have hundreds or even thousands of variables and possibly million constraints.

When the problem to be solved involves a quadratic function instead of a linear function, but still presents linear constraints, this problem can be cast as a quadratic programming problem. Quadratic functions are polynomial functions in each each term may have at most a total degree of 2. For example, consider the function

f(x, y, z) = 2x² + 5xy + y² – z² + x – 5.

Now let’s check the sum of the degrees for each variable on the polynomial terms. We start by writing the missing terms of the polynomial

f(x, y, z) = 2 x2y0z0 + 5 x1y1z0 + 2 x0y2z0x0y0z2 + x1y0z0 – 5 x0y0z0

and then proceed to check the sum of the degrees at each term. In the first term, 2+0+0 = 2. For the second, 1+1+0 = 2, and so on. Those functions have a nice property that they can be expressed in a matrix form

f(x) = 1/2 xT Q x + cTx.

Here, x and c are vectors. The matrix Q is a symmetric matrix specifying how the variables combine in the quadratic terms of the function. If this matrix is positive definite, then the function is convex, and the optimization has a single, unique optimum (we say it has a global optimum). The coefficients c specify the linear terms of our function.

Source code

The available source code is based on a translation of the Fortran code written by Berwin A. Turlach. However, some modifications have been made. Fortran uses column-major ordering for matrices, meaning that matrices are stored in memory in sequential order of column elements. Almost all other languages use row-major ordering, including C, C++, Java and C#. In order to improve data locality, I have modified the code to use the transpose of the original matrices D and A. I have also modified the QP formulation adopted in the Goldfarb and Idnani paper to reflect the form presented in the introduction.

In this article, we will be using the GoldfarbIdnani class from the Accord.NET Framework, as well as the QuadraticObjectiveFunction. Their source code are available at GoldfarbIdnani.cs and QuadraticObjectiveFunction.cs, respectively. Please note that the sample application version available in this blog post will most likely not be the most recently, updated, fixed and enhanced version of the code. For the latest version, be sure to download the latest version of the framework on the project site or through a NuGet package.

Using the code

The first step in solving a quadratic programming problem is, well, specifying the problem. To specify a quadratic programming problem, one would need two components: a matrix D describing the relationship between the quadratic terms, and a vector d describing the linear terms. Perhaps this would work better with an example.

Suppose we are trying to solve a minimization problem. Given a function, the goal in such problems is to find the correct set of function arguments which would result in the minimum possible value for the function. An example of a quadratic minimization problem is given below:

min f(x, y) = 2x² – xy + 4y² – 5x – 6y

subject to the constraints:

x – y = 5
x = 10


wolfram
(generated with Wolfram Alpha)

However, note that this problem involves a set of constraints. The required solution for this minimization problem is required to lie in the interval specified by the constraints. More specifically, any x and y pair candidate for being a minimal of the function must respect the relations x – y = 5 and x >= 10. Thus, instead of lying in the unconstrained minimum of the function surface shown above, the solution lies slightly off the center of the surface. This is an obvious easy problem to solve manually, but it will fit for this demonstration.


wolfram2

As it can be seen (and also live demonstrated by asking Wolfram Alpha) the solution lies on the point (10,5), and the constrained minimum of the function is given by 170. So, now that we know what a quadratic programming problem looks like, how can we actually solve it?

Specifying the objective function

The first step in solving a quadratic programming problem is to specify the objective function. Using this code, there are three ways to specify it. Each of them has their own advantages and disadvantages.

1. Manually specifying the QP matrix.

This is the most common approach for numerical software, and probably the most cumbersome for the user. The problem matrix has to be specified manually. This matrix is sometimes denoted Q, D or H as it actually denotes the Hessian matrix for the problem.

The matrix Q is used to describe the quadratic terms of our problem. It is a n x n matrix, in which n corresponds to the number of variables in our problem, covering all possible combinations of variables. Recall our example given on the start of this section. We have 2 variables, x and y. Thus, our matrix Q is 2 x 2. The possible combinations for x and y are expressed in the table below.

x y
x x*x x*y
y y*x y*y

To form our matrix Q, we can take all coefficients associated with each pair mentioned on the table above. The diagonal elements should also be multiplied by two (this is actually because the matrix is the Hessian matrix of the problem: it is the matrix of all second-order derivatives for the function. Since we have only at most quadratic terms, the elementary power rule of derivation “drops” the ² from the x² and y² terms – I think a mathematician would hit me with a stick for explaining it like this, but it serves well for a quick, non-technical explanation).

Remember our quadratic terms were 2x²  – 1xy + 4y². Writing the terms on their proper position and differentiating, we have:


As it can be seen, the matrix is also symmetric (and often, but not always, positive definite). The next step, more trivial, is to write a vector d containing the linear terms. The linear terms are –5x –6y, and thus our vector d can be given by:


Therefore our C# code can be created like this:

2. Using lambda expressions

This approach is a bit more intuitive and less error prone. However, it involves lambdas functions and some people find it hard to follow them. Another disadvantage is that we will lose the edit & continue debugging ability of visual studio. The advantage is that the compiler may catch some obvious syntax errors automatically. Below we are using the QuadraticObjectiveFunction from Accord.NET to specify our target objective function.

Note that the x and y variables could have been initialized to any value. They are only used as symbols, and not used in any computations.

3. Using text strings

This approach is more intuitive but a bit more error prone. The function can be specified using strings, as in a standard mathematical formula. However, since all we have are strings, there is no way to enforce static, compile time checking.

Couldn’t be easier.

Specifying the constraints

The next step in specifying a quadratic programming problem is to specify the constraints. The constraints can be specified in almost the same way as the objective function.

1. Manually specifying the constraints matrix

The first option is to manually specify the constraints matrix A and vector b. The constraint matrix expresses the way the variables should be combined when compared to corresponding value on vector b. It is possible to specify either equality constraints or inequality constraints. The formulation used in this code is slightly different from the one used in Turlach’s original implementation. The constraints are supposed to be in the form:

A1 x = b1
A2 x = b2

This means that each line of matrix A expresses a possible combination of variables x which should be compared to the corresponding line of b. An integer variable m can be specified to denote how many of the first rows of matrix A should be treated as equalities rather than inequalities. Recall that in our example the constraints are given by 1x –1y = 5 and 1x = 10. Lets write this down in a tabular form:

# x y ? b
q1 1 -1 = 5
q2 1 0 = 10

Thus our matrix A and vector b can be specified as:



And not forgetting that m = 1, because the first constraint is actually an equality.

2. Using classes and objects

A more natural way to specify constraints is using the classes and objects of the Accord.NET Framework. The LinearConstraint class allows one to specify a single constraint using an object-oriented approach. It doesn’t have the most intuitive usage on earth, but has much more expressiveness. It can also be read aloud, it that adds anything! 🙂

The specification is centered around the notion that variables are numbered and have an associated index. For example, x is the zero-th variable of the problem. Thus x has an index of 0 and y has an index of 1. So for example, reading aloud the last constraint, it is possible to express how the variables at indices 0 and 1, when combined as 1x and –1y, should be equal to value 5.

2. Using lambda expressions

A more intuitive way to express constraints is again using lambda expressions. And again the problems are the same: some people find it hard to follow and we lose edit & continue.

3. Using text strings

Same as above, but with strings.

Finally, creating and solving the problem

Once we have specified what do we want, we can now ask the code for a solution. In case we have opted for manually specifying the objective function Q and d, and the constraint matrix A, vector b and integer m, we can use:

In case we have opted for creating a QuadraticObjectiveFunction object and a list of constraints instead, we can use:

After the solver object has been created, we can call Minimize() to solve the problem. In case we have opted for manually specifying Q and d, we can use:

The solution will be available in the Solution property of the solver object, and will be given by:

Sample application

The Accord.NET Framework now includes a sample application demonstrating the use of the Goldfarb-Idnani Quadratic Programming Solver. It can be downloaded at the Accord.NET Framework site, and also comes together with recent versions of the framework (> 2.6).

Solver

Solver sample application included in the Accord.NET Framework.

Remarks

Because the code has been translated by hand (in contrast of using automatic translators such as f2c) there could be potential bugs in the code. I have tested the code behavior against R’s quadprog package and still didn’t find errors. But this does not mean the code is bug-free. As always, as is the case of everything else in this blog, this code is published in good faith, but I can not guarantee the correctness of everything. Please read the disclaimer for more information.

References

Accord.NET Framework: Gesture Controller Components

video06f39b3b4aba-25255B85-25255D

A new version (2.2.0) of the Accord.NET Framework has just been released. This new version introduces many new features, fixes and improvements. The most interesting additions are certainly the HeadController and FaceController .NET components.

 

Accord.NET Framework sample application for Gesture Controller Components

The Accord.NET Controller components can be used to generate events based on webcam motion. By using a combination of HaarCascadeClassifiers, Camshift and Template-based Tracking, those components are able to detect when a face enters scene, leaves the scene, and moves across a scene.

The video above shows only the sample application which comes together with the framework. However, the interesting part is that this is just a sample of what can be accomplished using the real controller components. The controller components are .NET components, similar to Button, Label or Timer, and can be dragged and dropped from Visual Studio’s ToolBox directly into any application.

Accord.NET Controller Components in Visual Studio

Once inside an application, it will be possible to set event actions just as in any other .NET component:

Setting events for head movements using Accord.NET Controller Components

The controls have built-in support for calibration. All values except tilting angle are passed to the hosting application in the [-1;+1] range, in which -1 indicates either a total left/down/backwards position and +1 indicates a total right/up/forward position. The tilting angle is given in radians. Please note that the face controller is still a bit experimental and still requires some tuning.

 

This new version also introduces HSL Color Range object trackers, more default Haar Cascades, an experimental version of linear-chain Conditional Random Fields, and the ability to generate hardcoded C# definitions of any Haar cascade available in the OpenCV XML format. There is also initial support for finger detection using new implementations for Border-Following contour extraction, K-Curvatures and Convex Hull Defects extraction. On the statistics side, there has been the inclusion of the Von-Mises distribution, Moving and Running Normal distributions and improvements in the Multivariate Gaussian implementation. The full release notes are available in the release’s download page.

Also, a special thanks to Professor Dr. Modesto Castrillón Santana to let me embed some of his Haar definitions into the framework under the LGPL license. Please be sure to include a reference to his work if you plan to use this in an academic publication.

 

As always, I hope those additions and improvements will be useful to everyone 🙂

New additions to Accord.NET: Computer Vision namespace, Camshift and Viola-Jones Detector

tracking4_thumb-5B3-5D

The next version of the Accord.NET Framework will feature two important additions, alongside with a new namespace to accommodate them: The Camshift object tracker and the Viola-Jones object detector. Both will be located inside the new Accord.Vision namespace for Computer Vision algorithms.

 

Accord.NET Camshift Object Tracker.

Camshift object tracker

Accord.NET Viola-Jone's method for face detection.

Viola-Jones object detector

Additionally, other cool additions include the availability of Multi-class Kernel Support Vector Machines (with support for parallelized learning algorithms), Generic Cross-validation classes for model evaluation and Generic Gridsearch classes for model parameter tuning.

The Linear and Non-linear Kernel Discriminant Analysis implementations have been updated to use the Generalized Eigenvalue Decomposition instead of the Eigenvalue Decomposition following a prior matrix inversion. This has cut execution time in nearly half. Other optimizations have also been made to some radial basis Kernel functions, although they are not very noticeable.

The next version of the Accord.NET Framework will be labeled version 2.1.0. There are some interface changes that may break compatibility with older versions. Also the minimum required version of AForge.NET Framework has been leveled up to 2.1.3. You may have to update your assemblies if you wish to upgrade an already existing project.

Accord.NET Framework – An extension to AForge.NET

Today I have just released the first version of Accord.NET. The Accord.NET Framework is a C# framework I have developed over the years while working on several areas of artificial intelligence, machine learning and statistics.

The Accord.NET Framework extends the excellent AForge.NET Framework with new tools and libraries. In order to use Accord.NET, you must have AForge.NET already installed. The first version number of Accord.NET will be used to indicate the compatibility status with AForge.NET versions, thus the first version will be starting at 2.0.0.

 

The framework is comprised by the set of libraries and sample applications, which demonstrate their features:

  • Accord.Statistics – library with statistical analysis and other tools;
  • Accord.Imaging – extension to the AForge.NET Imaging library with new filters and routines;
  • Accord.Neuro – extension to the AForge.NET Neuro library with other learning algorithms;
  • Accord.MachineLearning – extension to AForge’s machine learning library with Support Vector Machines;
  • Accord.Audio – experimental library with filters and audio processing routines.

 

The Framework has just been released, so be ready to expect bugs and unpolished/unfinished areas. The code is released under a LGPL license. For additional help, issues, and discussions, please refer to the recently created forums.

Kernel Support Vector Machines for Classification and Regression in C#

Kernel methods in general have gained increased attention in recent years, partly due to the grown of popularity of the Support Vector Machines. Support Vector Machines are linear classifiers and regressors that, through the Kernel trick, operate in reproducing Kernel Hilbert spaces and are thus able to perform non-linear classification and regression in their input space.

Foreword

If you would like to use SVMs in your .NET applications, download the Accord.NET Framework through NuGet. Afterwards, creating support vector machines for binary and multi-class problems with a variety of kernels becomes very easy. The Accord.NET Framework is a LGPL framework which can be used freely in commercial, closed-source, open-source or free applications. This article explains a bit how the SVM algorithms and the overall SVM module was designed before being added as part of the framework.

Contents

  1. Introduction
    1. Support Vector Machines
    2. Kernel Support Vector Machines
      1. The Kernel Trick
      2. Standard Kernels
  2. Learning Algorithms
    1. Sequential Minimal Optimization
  3. Source Code
    1. Support Vector Machine
    2. Kernel Support Vector Machine
    3. Sequential Minimal Optimization
  4. Using the code
  5. Sample application
    1. Classification
    2. Regression
  6. See also
  7. References

Introduction

Support Vector Machines

Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. In simple words, given a set of training examples, each marked as belonging to one of two categories, a SVM training algorithm builds a model that predicts whether a new example falls into one category or the other. Intuitively, an SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

A linear support vector machine is composed of a set of given support vectors z and a set of weights w. The computation for the output of a given SVM with N support vectors z1, z2, … , zN and weights w1, w2, … , wN is then given by:

F(x) = sum_{i=1}^N  w_i , left langle z_i,x right rangle  + b

Kernel Support Vector Machines

The original optimal hyperplane algorithm proposed by Vladimir Vapnik in 1963 was a linear classifier. However, in 1992, Bernhard Boser, Isabelle Guyon and Vapnik suggested a way to create non-linear classifiers by applying the kernel trick (originally proposed by Aizerman et al.) to maximum-margin hyperplanes. The resulting algorithm is formally similar, except that every dot product is replaced by a non-linear kernel function. This allows the algorithm to fit the maximum-margin hyperplane in a transformed feature space. The transformation may be non-linear and the transformed space high dimensional; thus though the classifier is a hyperplane in the high-dimensional feature space, it may be non-linear in the original input space.

Using kernels, the original formulation for the SVM given SVM with support vectors z1, z2, … , zN and weights w1, w2, … , wN is now given by:

F(x) = sum_{i=1}^N  w_i , k(z_i,x) + b

It is also very straightforward to see that, using a linear kernel of the form K(z,x) = <z,x> = zTx, we recover the original formulation for the linear SVM.

The Kernel trick

The Kernel trick is a very interesting and powerful tool. It is powerful because it provides a bridge from linearity to non-linearity to any algorithm that solely depends on the dot product between two vectors. It comes from the fact that, if we first map our input data into a higher-dimensional space, a linear algorithm operating in this space will behave non-linearly in the original input space.

Now, the Kernel trick is really interesting because that mapping does not need to be ever computed. If our algorithm can be expressed only in terms of a inner product between two vectors, all we need is replace this inner product with the inner product from some other suitable space. That is where resides the “trick”: wherever a dot product is used, it is replaced with a Kernel function. The kernel function denotes an inner product in feature space and is usually denoted as:

K(x,y) = <φ(x),φ(y)>

Using the Kernel function, the algorithm can then be carried into a higher-dimension space without explicitly mapping the input points into this space. This is highly desirable, as sometimes our higher-dimensional feature space could even be infinite-dimensional and thus infeasible to compute.

Standard Kernels

Some common Kernel functions include the linear kernel, the polynomial kernel and the Gaussian kernel. Below is a simple list with their most interesting characteristics.

 

Linear Kernel The Linear kernel is the simplest kernel function. It is given by the common inner product <x,y> plus an optional constant c. Kernel algorithms using a linear kernel are often equivalent to their non-kernel counterparts, i.e. KPCA with linear kernel is equivalent to standard PCA. k(x, y) = x^T y + c
Polynomial Kernel The Polynomial kernel is a non-stationary kernel. It is well suited for problems where all data is normalized. k(x, y) = (alpha x^T y + c)^d
Gaussian Kernel The Gaussian kernel is by far one of the most versatile Kernels. It is a radial basis function kernel, and is the preferred Kernel when we don’t know much about the data we are trying to model. k(x, y) = expleft(-frac{ lVert x-y rVert ^2}{2sigma^2}right)

For more Kernel functions, check Kernel functions for Machine Learning Applications. The accompanying source code includes definitions for over 20 distinct Kernel functions, many of them detailed in the aforementioned post.

Learning Algorithms

Sequential Minimal Optimization

Previous SVM learning algorithms involved the use of quadratic programming solvers. Some of them used chunking to split the problem in smaller parts which could be solved more efficiently. Platt’s Sequential Minimal Optimization (SMO) algorithm puts chunking to the extreme by breaking the problem down into 2-dimensional sub-problems that can be solved analytically, eliminating the need for a numerical optimization algorithm.

The algorithm makes use of Lagrange multipliers to compute the optimization problem. Platt’s algorithm is composed of three main procedures or parts:

  • run, which iterates over all points until convergence to a tolerance threshold;
  • examineExample, which finds two points to jointly optimize;
  • takeStep, which solves the 2-dimensional optimization problem analytically.

The algorithm is also governed by three extra parameters besides the Kernel function and the data points.

  • The parameter Ccontrols the trade off between allowing some training errors and forcing rigid margins. Increasing the value of C increases the cost of misclassifications but may result in models that do not generalize well to points outside the training set.
  • The parameter ε controls the width of the ε-insensitive zone, used to fit the training data. The value of ε can affect the number of support vectors used to construct the regression function. The bigger ε, the fewer support vectors are selected and the solution becomes more sparse. On the other hand, increasing the ε-value by too much will result in less accurate models.
  • The parameter T is the convergence tolerance. It is the criterion for completing the training process.

After the algorithm ends, a new Support Vector Machine can be created using only the points whose Lagrange multipliers are higher than zero. The expected outputs yi can be individually multiplied by their corresponding Lagrange multipliers ai to form a single weight vector w.

F(x) = sum_{i=0}^N { alpha_i y ,  k(z_i,x) } + b = sum_{i=0}^N { w_i , k(z_i,x) } + b

Sequential Minimal Optimization for Regression

A version of SVM for regression was proposed in 1996 by Vladimir Vapnik, Harris Drucker, Chris Burges, Linda Kaufman and Alex Smola. The method was called support vector regression and, as is the case with the original Support Vector Machine formulation, depends only on a subset of the training data, because the cost function for building the model ignores any training data close to the model prediction that is within a tolerance threshold ε.

Platt’s algorithm has also been modified for regression. Albeit still maintaining much of its original structure, the difference lies in the fact that the modified algorithm uses two Lagrange multipliers âi and ai for each input point i. After the algorithm ends, a new Support Vector Machine can be created using only points whose both Lagrange multipliers are higher than zero. The multipliers âi and ai are then subtracted to form a single weight vector w.

F(x) = sum_{i=0}^N { (hat{alpha_i} - alpha_i) ,  k(z_i,x) } + b = sum_{i=0}^N { w_i , k(z_i,x) } + b

The algorithm is also governed by the same three parameters presented above. The parameter ε, however, receives a special meaning. It governs the size of the ε-insensitive tube over the regression line. The algorithm has been further developed and adapted by Alex J. Smola, Bernhard Schoelkopf and further optimizations were introduced by Shevade et al and Flake et al.

Source Code

Here is the class diagram for the Support Vector Machine module. We can see it is very simple in terms of standard class organization.

svm-classdiagram1

Class diagram for the (Kernel) Support Vector Machines module.

Support Vector Machine

Below is the class definition for the Linear Support Vector Machine. It is pretty much self explanatory.

Kernel Support Vector Machine

Here is the class definition for the Kernel Support Vector Machine. It inherits from Support Vector Machine and extends it with a Kernel property. The Compute method is also overridden to include the chosen Kernel in the model computation.

Sequential Minimal Optimization

Here is the code for the Sequential Minimal Optimization (SMO) algorithm.

Using the code

In the following example, we will be training a Polynomial Kernel Support Vector Machine to recognize the XOR classification problem. The XOR function is classic example of a pattern classification problem that is not linearly separable.

Here, remember that the SVM is a margin classifier that classifies instances as either 1 or –1. So the training and expected output for the classification task should also be in this range. There are no such requirements for the inputs, though.

To create the Kernel Support Vector Machine with a Polynomial Kernel, do:

After the machine has been created, create a new Learning algorithm. As we are going to do classification, we will be using the standard SequentialMinimalOptimization algorithm.

After the model has been trained, we can compute its outputs for the given inputs.

The machine should be able to correctly identify all of the input instances.

Sample application

The sample application is able to perform both Classification and Regression using Support Vector Machines. It can read Excel spreadsheets and determines the task to be performed depending on the number of the columns in the sheet. If the input table contains two columns (e.g. X and Y) it will be interpreted as a regression problem X –> Y. If the input table contains three columns (e.g. x1, x2 and Y) it will be interpreted as a classification problem <x1,x2> belongs to class Y, Y being either 1 or -1.

Classification

To perform classification, load a classification task data such as the Yin Yang classification problem.

svm2-1

Yin Yang classification problem. The goal is to create a model which best determines whether a given point belongs to class blue or green. It is a clear example of a non-linearly separable problem.

svm2-2 svm2-3

Creation of a Gaussian Kernel Support Vector Machine with σ = 1.2236, C = 1.0, ε = 0.001 and T = 0.001.

svm2-4

Classification using the created Support Vector Machine. Notice it achieves an accuracy of 97%, with sensitivity and specifity rates of 98% and 96%, respectively.

Regression

To perform regression, we can load the Gaussian noise sine wave example.

svm2-7

Noise sine wave regression problem.

 

svm2-9 svm2-8

Creation of a Gaussian Kernel Support Vector Machine with σ = 1.2236, C = 1.0, ε = 0.2 and T = 0.001.

After the model has been created, we can plot the model approximation for the sine wave data. The blue line shows the curve approximation for the original red training dots.

svm2-10

Regression using the created Kernel Support Vector Machine. Notice the coefficient of determination r² of 0.95. The closer to one, the better.

See also

 

References