Principal Component Analysis (PCA) is an exploratory tool designed by Karl Pearson in 1901 to identify unknown trends in a multidimensional data set. It involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components.
- Download source code.
- Download sample application.
- Browse source code online.
Foreword
Before you read this article, please keep in mind that it was written before the Accord.NET Framework was created and became popular. As such, if you would like to do Principal Component Analysis in your projects, download the accord-net framework from NuGet and either follow the starting guide or download the PCA sample application from the sample gallery in order to get up and running quickly with the framework.
Introduction
PCA essentially rotates the set of points around their mean in order to align with the first few principal components. This moves as much of the variance as possible (using a linear transformation) into the first few dimensions. The values in the remaining dimensions, therefore, tend to be highly correlated and may be dropped with minimal loss of information. Please note that the signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA.
For a more complete explanation for PCA, please visit Lindsay Smith excellent Tutorial On Principal Component Analysis (2002).
Accord.NET Framework
This new library, which I called Accord.NET, was initially intended to extend the AForge.NET Framework through the addition of new features such as Principal Component Analysis, numerical decompositions, and a few other mathematical transformations and tools. However, the library I created grew larger than the original framework I was trying to extend. In a few months, both libraries will merge under Accord.NET. (Update April 2015)
Design decisions
As people who want to use PCA in their projects usually already have their own Matrix classes definitions, I decided to avoid using custom Matrix and Vector classes in order to make the code more flexible. I also tried to avoid dependencies on other methods whenever possible, to make the code very independent. I think this also made the code simpler to understand.
The code is divided into two projects:
- Accord.Math, which provides mathematical tools, decompositions and transformations, and
- Accord.Statistics, which provides the statistical analysis, statistical tools and visualizations.
Both of them depends on the AForge.NET core. Also, their internal structure and organization tries to mimic AForge’s wherever possible.
The given source code doesn’t include the full source of the Accord Framework, which remains as a test bed for new features I’d like to see in AForge.NET. Rather, it includes only limited portions of the code to support PCA. It also contains code for Kernel Principal Component Analysis, as both share the same framework. Please be sure to look for the correct project when testing.
Code overview
Below is the main code behind PCA.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
/// <summary>Computes the Principal Component Analysis algorithm.</summary> public void Compute() { int rows = sourceMatrix.GetLength(0); int cols = sourceMatrix.GetLength(1); // Create a new matrix to work upon double[,] matrix = new double[rows, cols]; // Prepare the data, storing it in the new matrix. if (this.analysisMethod == AnalysisMethod.Correlation) { for (int i = 0; i < rows; i++) for (int j = 0; j < cols; j++) // subtract mean and divide by standard deviation (convert to Z Scores) matrix[i, j] = (sourceMatrix[i, j] - columnMeans[j]) / columnStdDev[j]; } else { for (int i = 0; i < rows; i++) for (int j = 0; j < cols; j++) // Just center the data around the mean. Will have no effect if the // data is already centered (the mean will be zero). matrix[i, j] = (sourceMatrix[i, j] - columnMeans[j]); } // Perform the Singular Value Decomposition (SVD) of the matrix SingularValueDecomposition singularDecomposition = new SingularValueDecomposition(matrix); singularValues = singularDecomposition.Diagonal; // Eigen values are the square of the singular values for (int i = 0; i < singularValues.Length; i++) { eigenValues[i] = singularValues[i] * singularValues[i]; } // The principal components of 'Source' are the eigenvectors of Cov(Source). Thus if we // calculate the SVD of 'matrix' (which is Source standardized), the columns of matrix V // (right side of SVD) will be the principal components of Source. // The right singular vectors contains the principal components of the data matrix this.eigenVectors = singularDecomposition.RightSingularVectors; // The left singular vectors contains the scores of the principal components this.resultMatrix = singularDecomposition.LeftSingularVectors; // Calculate proportions double sum = 0; for (int i = 0; i < eigenValues.Length; i++) sum += eigenValues[i]; sum = (1.0 / sum); for (int i = 0; i < eigenValues.Length; i++) componentProportions[i] = eigenValues[i] * sum; // Calculate cumulative proportions this.componentCumulative[0] = this.componentProportions[0]; for (int i = 1; i < this.componentCumulative.Length; i++) { this.componentCumulative[i] = this.componentCumulative[i - 1] + this.componentProportions[i]; } // Creates the object-oriented structure to hold the principal components PrincipalComponent[] components = new PrincipalComponent[singularValues.Length]; for (int i = 0; i < components.Length; i++) { components[i] = new PrincipalComponent(this, i); } this.componentCollection = new PrincipalComponentCollection(components); } |
Using the code
To perform a simple analysis, you can simple instantiate a new PrincipalComponentAnalysis object passing your data and call its Compute method to compute the model. Then you can simply call the Transform method to project the data into the principal component space.
A sample sample code demonstrating its usage is presented below.
1 2 3 4 5 6 7 8 9 10 |
// Creates the Principal Component Analysis of the given source PrincipalComponentAnalysis pca = new PrincipalComponentAnalysis(sourceMatrix, PrincipalComponentAnalysis.AnalysisMethod.Correlation); // Compute the Principal Component Analysis pca.Compute(); // Creates a projection considering 80% of the information double[,] components = pca.Transform(sourceMatrix, 0.8f, true); |
Example application
To demonstrate the use of PCA, I created a simple Windows Forms Application which performs simple statistical analysis and PCA transformations.



Note: The principal components are not unique because the Singular Value Decomposition is not unique. Also the signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA.
Together with the demo application comes an Excel spreadsheet containing several data examples. The first example is the same used by Lindsay on his Tutorial on Principal Component Analysis. The others include Gaussian data, uncorrelated data and linear combinations of Gaussian data to further exemplify the analysis.
I hope this code and example can be useful! If you have any comments about the code or the article, please let me know.
See also
A Tutorial On Principal Component Analysis with the Accord.NET Framework
This is a tutorial for those who are interested in learning how PCA works and how each step of Lindsay’s tutorial can be computed in the Accord.NET Framework, in C#.
Kernel Principal Component Analysis in C#
This is the non-linear extension of Principal Component Analysis. While linear PCA is restricted to rotating or scaling the data, kernel PCA can do arbitrary transformations (such as folding and twisting the data and the space that contains the data).