Generalized Eigenvalue Decomposition in C#

5f7de24799f03b87862a0cdad7b0ed9e

For quite some time I have been trying to find a Java or C# implementation of the Generalized Eigenvalue Decomposition. I just wanted something simple, that didn’t depend on any external libraries such as BLAS or LAPACK. As I could not find any simple and understandable implementation of this decomposition, I decided to implement one myself.

Introduction

As Wikipedia says, a generalized Eigen value problem is the problem of finding a vector v that obeys

 Amathbf{v} = lambda B mathbf{v} quad quad

where A and B are matrices. If v obeys this equation, with some λ, then we call v the generalized eigenvector of A and B, and λ is called the generalized eigenvalue of A and B which corresponds to the generalized eigenvector v. The possible values of λ must obey the following equation

det(A - lambda B)=0.,

An interesting feature of the generalized eigenvalue decomposition is that it finds the eigenvectors of the matrix B-1A even if B is singular and does not have an inverse. If B is nonsingular, the problem could be solved by reducing it to a standard eigenvalue problem of the form B-1Ax=λx. However, because B can be singular, an alternative algorithm, called the QZ method, is necessary.

Solving the problem using the QZ method is equivalent to computing the Eigen decomposition of B-1A without the need of inverting B, which could be impossible or ill-conditioned if B is singular or near-singular. It is also computationally simpler as it saves the time and memory needed for inverting and storing the inverse of B.

EISPACK

EISPACK is a software library for numerical computation of eigenvalues and eigenvectors of matrices, written in FORTRAN. It was originally written around 1972–1973, based heavily on algorithms originally implemented in ALGOL. Because those algorithms are much more human-readable than their LAPACK counterparts, I decided to port the related functions from EISPACK.

The functions which implement the Generalized Eigenvalue Decomposition in EISPACK are called QZHES, QZIT, QZVAL and QZVEC. As their name implies, they use the QZ method for finding the generalized eigenvalues of a matrix pair (A,B).

Source code

The source code is available in the download link in the upper part of this article. It can also be found in the latest versions of the Accord.NET Framework. The algorithm presented in the code is equivalent to the eig(A,B) command of MATLAB.

Because the code has been ported from FORTRAN, many sections of the algorithms have stayed as they appear in their original functions. Please be aware that it may contain lots of labels and goto’s, as most structured programming features we take for granted today were not commonplace at the time.

Using the code

Using the code is much simpler than dealing with LAPACK functions. Because the functions have been implemented directly in C#, the code is somewhat more human readable and has also been wrapped in the same nice object-oriented approach as the other matrix decompositions that can be found in MAPACK and JAMA.

Using the given sources, the following MATLAB code:

Becomes equivalent to the C# code:

in the sense that both satisfy the identity A*V = B*V*D .

References

RANdom Sample Consensus (RANSAC) in C#

RANSAC is an iterative method to build robust estimates for parameters of a mathematical model from a set of observed data which is known to contain outliers. The RANSAC algorithm is often used in computer vision, e.g., to simultaneously solve the correspondence problem and estimate the fundamental matrix related to a pair of stereo cameras.

The code presented here is part of the Accord.NET Framework. The Accord.NET Framework is a framework for developing machine learning, computer vision, computer audition, statistics and math applications in .NET. To use the framework in your projects, install it by typing Install-Package Accord.MachineLearning in your IDE’s NuGet package manager.

Introduction

RANSAC is an abbreviation for “RANdom SAmple Consensus“. It is an iterative method to estimate parameters of a mathematical model from a set of observed data which may contains outliers. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. The algorithm was first published by Fischler and Bolles in 1981.

The basic assumption is that the data consists of “inliers”, i.e., data whose distribution can be explained by some mathematical model, and “outliers” which are data that do not fit the model. Outliers could be considered points which come from noise, erroneous measurements or simply incorrect data. RANSAC also assumes that, given a set of inliers, there exists a procedure which can estimate the parameters of a model that optimally explains or fits this data.

Example: Fitting a simple linear regression

We can use RANSAC to robustly fit a linear regression model using noisy data. Consider the example below, in which we have a cloud of points that seems to belong to a line. These are the inliers of the data. The other points, which can be seem as measurement errors or extreme noise values, are points expected to be considered outliers.

ransac-7

Linear structure contained in noisy data.

RANSAC is able to automatically distinguish the inliers from the outliers through the evaluation of the linear regression model. To do so, it randomly selects subsets from the data and attempts to fit linear regression models using them. The model which best explains most of the data will then be returned as the most probably correct model fit.

The image below shows the result of fitting a linear regression directly (as shown by the red line) and using RANSAC (as shown by the blue line). We can see that the red line represents poorly the data structure because it considers all points in order to fit the regression model. The blue line seems to be a much better representation of the linear relationship hidden inside the overall noisy data.

ransac-8

Hidden linear structure inside noisy data. The red line shows the fitting of a linear regression model directly considering all data points. The blue line shows the same result using RANSAC.

Source code

The code below implements RANSAC using a generic approach. Models are considered to be of the reference type TModel and the type of data manipulated by this model is considered to be of the type TData. This approach allows for the creation of a general purpose RANSAC algorithm which can be used in very different contexts, be it the fitting of linear regression models or the estimation of homography matrices from pair of points in different images.

This code is available in the class RANSAC of the Accord.NET Framework (source).

Besides the generic parameters, the class utilizes three delegated functions during execution.

  • The Fitting function, which should accept a subset of the data and use it to fit a model of the chosen type, which should be returned by the function;
  • The Degenerate function, which should check if a subset of the training data is already known to result in a poor model, to avoid unnecessary computations; and
  • The Distance function, which should accept a model and a subset of the training data to compute the distance between the model prediction and the expected value for a given point. It should return the indices of the points only whose predicted and expected values are within a given threshold of tolerance apart.

Using the code

In the following example, we will fit a simple linear regression of the form x→y using RANSAC. The first step is to create a RANSAC algorithm passing the generic type parameters of the model to be build, i.e. SimpleLinearRegression and of the data to be fitted, i.e. a double array.

In this case we will be using a double array because the first position will hold the values for the input variable x. The second position will be holding the values for the output variables y. If you are already using .NET 4 it is possible to use the Tuple type instead.

After the creation of the RANSAC algorithm, we should set the delegate functions which will tell RANSAC how to fit a model, how to tell if a set of samples is degenerate and how to check for inliers in data.

Finally, all we have to do is call the Compute method passing the data. The best model found will be returned by the function, while the given set of inliers indices for this model will be returned as an out parameter.

Sample application

The accompanying source application demonstrates the fitting of the simple linear regression model with and without using RANSAC. The application accepts Excel worksheets containing the independent values in the first column and the dependent variables in the second column.

ransac-9

References