Linear discriminant analysis (LDA) is a method used in statistics and machine learning to find a linear combination of features which best characterize or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

*The code presented here is also part of the Accord.NET Framework. The Accord.NET Framework is a framework for developing machine learning, computer vision, computer audition, statistics and math applications. It is based on the already excellent AForge.NET Framework. Please see the starting guide for mode details. The latest version of the framework includes the latest version of this code plus many other statistics and machine learning tools.*

### Motivations

The goals of LDA are somewhat similar to those of PCA. But different from LDA, PCA is an unsupervised technique and as such does not include label information of the data, effectively ignoring this often useful information. For instance, consider two cigar-like clusters in 2 dimensions, one cigar having **y = c** and the other **y = –c** (with **c** being a arbitrary constant), as the image below suggests:

This example was adapted from the

note on Fisher Linear Discriminant Analysis by Max Welling.

The cigars are positioned in parallel and very closely together, such that the variance in the total dataset, ignoring the labels, is in the direction of the cigars. For classification, this would be a terrible projection, because all labels will get mixed and we will destroy any useful information:

A much more useful projection is orthogonal to the cigars, which would perfectly separate the data-cases:

The first row of images was obtained by performing **PCA** on the original data. The left image is the PCA projection using the first two components. However, as the analysis says the first component accounts for 96% of the information, one can infer all other components could be safely discarded. The result is the image on the right, clearly a not very useful projection.

The second row of images was obtained by performing **LDA** on the original data. The left image is the LDA projection using the first two dimensions. However, as the analysis says the first dimension accounts for 100% of the information, all other dimensions could be safely discarded. Well, this is actually true, as we can see on the result in the right image.

### Analysis Overview

Linear Discriminant Analysis is closely related to principal component analysis (PCA) and factor analysis in that both look for linear combinations of variables which best explain the data. LDA explicitly attempts to model the difference between the classes of data. PCA on the other hand does not take into account any difference in class, and factor analysis builds the feature combinations based on differences rather than similarities. Linear Discriminant Analysis considers maximizing the following objective:

where

are the Between-Class Scatter Matrix and Within-Class Scatter Matrix, respectively. The optimal solution can be found by computing the Eigenvalues of S_{B}S_{W}^{-1} and taking the Eigenvectors corresponding to the largest Eigen values to form a new basis for the data. Those can be easily obtained by computing the generalized eigenvalue decomposition of S_{B} and S_{W}.

### Source Code

The source code below shows the main algorithm for Linear Discriminant Analysis.

1 |
<span>/// <summary></span><br /><span>/// Computes the Multi-Class Linear Discriminant Analysis algorithm.</span><br /><span>/// </summary></span><br /><span>public</span> <span>virtual</span> <span>void</span> Compute()<br />{<br /> <span>// Compute entire data set measures</span><br /> Means = Tools.Mean(source);<br /> StandardDeviations = Tools.StandardDeviation(source, totalMeans);<br /> <span>double</span> total = dimension;<br /><br /> <span>// Initialize the scatter matrices</span><br /> <span>this</span>.Sw = <span>new</span> <span>double</span>[dimension, dimension];<br /> <span>this</span>.Sb = <span>new</span> <span>double</span>[dimension, dimension];<br /><br /><br /> <span>// For each class</span><br /> <span>for</span> (<span>int</span> c = 0; c < Classes.Count; c++)<br /> {<br /> <span>// Get the class subset</span><br /> <span>double</span>[,] subset = Classes[c].Subset;<br /> <span>int</span> count = subset.GetLength(0);<br /><br /> <span>// Get the class mean and number of samples</span><br /> <span>double</span>[] mean = Tools.Mean(subset);<br /><br /><br /> <span>// Continue constructing the Within-Class Scatter Matrix</span><br /> <span>double</span>[,] Swi = Tools.Scatter(subset, mean);<br /> Sw = Sw.Add(Swi);<br /><br /> <span>// Continue constructing the Between-Class Scatter Matrix</span><br /> <span>double</span>[] d = mean.Subtract(totalMeans);<br /> <span>double</span>[,] Sbi = d.Multiply(d.Transpose()).Multiply(total/count);<br /> Sb = Sb.Add(Sbi);<br /><br /> <span>// Store some additional information</span><br /> <span>this</span>.classScatter[c] = Swi;<br /> <span>this</span>.classCount[c] = count;<br /> <span>this</span>.classMeans[c] = mean;<br /> <span>this</span>.classStdDevs[c] = Tools.StandardDeviation(subset, mean);<br /> }<br /><br /><br /><br /> <span>// Compute eigen value decomposition</span><br /> EigenValueDecomposition evd = <span>new</span> EigenValueDecomposition(Matrix.Inverse(Sw).Multiply(Sb));<br /><br /> <span>// Gets the eigenvalues and corresponding eigenvectors</span><br /> <span>double</span>[] evals = evd.RealEigenValues;<br /> <span>double</span>[,] eigs = evd.EigenVectors;<br /><br /><br /> <span>// Sort eigen values and vectors in ascending order</span><br /> eigs = Matrix.Sort(evals, eigs, <span>new</span> GeneralComparer(ComparerDirection.Descending, <span>true</span>));<br /><br /><br /> <span>// Calculate translation bias</span><br /> <span>this</span>.bias = eigs.Transpose().Multiply(totalMeans).Multiply(-1.0);<br /><br /> <span>// Store information</span><br /> <span>this</span>.EigenValues = evals;<br /> <span>this</span>.DiscriminantMatrix = eigs;<br /><br /><br /> <span>// Create projections</span><br /> <span>this</span>.result = <span>new</span> <span>double</span>[dimension, dimension];<br /> <span>for</span> (<span>int</span> i = 0; i < dimension; i++)<br /> <span>for</span> (<span>int</span> j = 0; j < dimension; j++)<br /> <span>for</span> (<span>int</span> k = 0; k < dimension; k++)<br /> result[i, j] += source[i, k] * eigenVectors[k, j];<br /><br /><br /> <span>// Computes additional information about the analysis and creates the</span><br /> <span>// object-oriented structure to hold the discriminants found.</span><br /> createDiscriminants();<br />} |

1 |
<span>/// <summary>Projects a given matrix into discriminant space.</summary></span><br /><span>/// <param name="matrix">The matrix to be projected.</param></span><br /><span>/// <param name="dimensions">The number of dimensions to consider.</param></span><br /><span>public</span> <span>virtual</span> <span>double</span>[,] Transform(<span>double</span>[,] data, <span>int</span> dimensions)<br />{<br /> <span>int</span> rows = data.GetLength(0);<br /> <span>int</span> cols = data.GetLength(1);<br /><br /> <span>double</span>[,] r = <span>new</span> <span>double</span>[rows, dimensions];<br /><br /> <span>// multiply the data matrix by the selected eigenvectors</span><br /> <span>for</span> (<span>int</span> i = 0; i < rows; i++)<br /> <span>for</span> (<span>int</span> j = 0; j < dimensions; j++)<br /> <span>for</span> (<span>int</span> k = 0; k < cols; k++)<br /> r[i, j] += data[i, k] * eigenVectors[k, j];<br /><br /> <span>return</span> r;<br />} |

### Using The Code

Code usage is simple, as it follows the same object model in the previous PCA example.

1 |
<span>// Creates the Linear Discriminant Analysis of the given source</span><br /> lda = <span>new</span> LinearDiscriminantAnalysis(data, labels);<br /><br /><br /> <span>// Compute the Linear Discriminant Analysis</span><br /> lda.Compute();<br /><br /><br /> <span>// Perform the transformation of the data using two dimensions</span><br /> <span>double</span>[,] result = lda.Transform(data, 2); |

### Considerations

Besides being useful in many situations, in many practical cases linear discriminants are not suitable. A nice insight is that LDA can be extended for use in non-linear classification via the kernel trick. Using kernels, the original observations can be effectively mapped into a higher dimensional non-linear space. Linear classification in this non-linear space is then equivalent to non-linear classification in the original space.

While the code available here already contains code for Kernel Discriminant Analysis, this is something I’ll address in the next post. If you have any suggestions or questions, please leave me a comment.

Finally, as usual, I hope someone finds this information useful.

### References:

- Welling, Max;
*Fisher Linear Discriminant Analysis*.

Availabe in: http://www.ics.uci.edu/~welling/classnotes/papers_class/Fisher-LDA.pdf. - Gutierrez-Osuna, Ricardo;
*Introduction to Pattern Analysis*.

Available in: http://research.cs.tamu.edu/prism/lectures/pr/pr_l10.pdf.

Hi!

Thanks for the great article. I’ve found this one and the one on PCA very useful to me (I’m writing a paper on face recognition).

I do have a question though:

If using LDA / PCA for data compression, how would you go about retrieving the original data? (Obviously with some sort of loss of quality since you are discarding information).

Thanks

Mike

Hi Mike,

In the Accord.NET Framework you can perform PCA reversion by calling the Revert method of the PrincipalComponentAnalysis class.

The reversion works by multiplying the data with the inverse of the transformation matrix. Since the transformation matrix is an orthogonal matrix of eigenvectors, this is equivalent to multiplying the data by the transpose of the eigenvectors. It may also be needed to re-scale and re-locate the data using the original column mean and standard deviation.

Now that you mentioned, I guess I didn’t implement a Revert method for LDA yet. But it should work the same way.

Best regards,

César

This comment has been removed by the author.

César,

Como eu poderia utilizar LDA para separar, de um conjunto de imagens de carros e pedestres, os carros dos pedestres? Especificamente, que variáveis eu deveria utilizar. Por exemplo, serviria o gradiente de cada padrão como entrada (supondo que pedestres e carros diferem nesse aspecto) para o algoritmo?

Abraços,

Sergio Penedo

Hi

Just wondering if anyone else are having the same problem as me. At the moment I am trying to make a face recognition program. Only at the detection stage, but the program can’t detect my college’s face. (He is a Zimbabwian.

Any suggestions?

Hi Anonymous,

The classification method has been corrected on the latest release of the framework. However, I have not yet updated the sample application available on this blog post. Sorry about that.

You can always download the latest version of the framework, with the latest enhancements and bugfixes at the project page.

Best regards,

César

thanks for the source code…its works man..!

This comment has been removed by the author.

Hi

I am a computer engineering student and I have a LDA face recognition project. I would like to examine your source code but source code and sample applications’ download links are not available anymore. Could you please update or send me codes. Thanks.

Hi Hakan,

I have corrected the links. The source code is available at the framework’s page.

Hope it helps!

Best regards,

Cesar

Thank you very much for your great interest.

Hi César, from my understanding of Fisher LDA if there are more dimensions than classes then there should only be at most c-1 discriminants for c-classes. So would we want to do something like “eigs = eigs.Submatrix(0, Math.min(Classes.count – 1, dimensions), indices);” to get discriminants when we have more dimensions than classes? Let me know what you think.

Thanks,

Eric

Hi Eric!

Yes, it would be definitely be possible! But when using the framework, it should also be possible to pass the desired number of discriminants in the Compute method while creating the analysis. So if you wish, you could limit the number of discriminants right on the beginning, and it should hopefully work without having to change the source code. Hope it helps!

Best regards,

Cesar