# OxyPlot.Axes.LinearAxis is obsolete

Warning 461 ‘OxyPlot.Axes.LinearAxis.LinearAxis(OxyPlot.Axes.AxisPosition, double, double, string)’ is obsolete

If you try to use OxyPlot.Axes.LinearAxis‘ constructor with parameters, the compiler will complain telling you the method has been deprecated and shouldn’t be used. I couldn’t find on the web which was the alternative solution to resolve this issue, but then it occurred to me that, what really is being deprecated, is the passage of arguments through the constructor’s parameters.

As such, the solution is simply to rewrite your code and call the axis’ default constructor instead, using C# object initialization syntax to configure your object:

Hope it can be of help if you were getting those warnings like me.

# Screencast Capture

I’ve recently started to record videos to demonstrate some capabilities of the Accord.NET Framework. Surprisingly, there were only a few, free, open source applications to achieve this goal – and none of them had all the features I needed.

It is, until I decided to roll my own.

Screencast Capture Lite is a tool for recording the desktop screen and saving it to a video file, preserving quality as much as possible. However, this does not mean it produces gigantic files which take a long time to be uploaded to the web. The application encodes everything using solely H624 in an almost lossless setting.

As a demonstration, please take a look on the Youtube video sample shown below. However, note that Youtube actually reduced the quality of the video, even if you watch it in HD. The local copy produced by Screencast Capture has an even higher quality than what is being shown, while the generated video file occupied less than 2 megabytes on disk.

And by the way what would be a better approach to demonstrate the capabilities of the Accord.NET frameworks other than writing this application using them?

Well, actually this application has been created specifically for two things:

• to aid in the recording of instructional videos for the Accord.NET Framework, and;
• to serve itself as a demonstration of the use and capabilities of the Accord.NET Framework.

This means the application is written entirely in C# making extensive use of both aforementioned frameworks. The application is completely open source and free, distributed under the terms of the GPL, and a suitable project page is already being served on GitHub.

Hope you will find it interesting!

# Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) for Unconstrained Optimization in C#

The Limited-memory Broyden-Fletcher-Goldfarb-Shanno method is an optimization method belonging to the family of quasi-Newton methods for unconstrained non-linear optimization. In short terms, it is an off-the-shelf optimizer for seeking either minimum or maximum points of a any differentiable and possibly non-linear function, requiring only an expression of the function and its gradient. The goal of this article is to provide and demonstrate a C# port of the L-BFGS method originally written by Jorge Nocedal in Fortran.

## Introduction

Function optimization is a common problem found in many numerical applications. Suppose we have a differentiable function f : Rn → R and we would like to obtain its minimal or maximum value while traversing its space of possible input values. Those kind of problems arise often in parameter optimization of machine learning models and other statistic related applications (and in a myriad of other applications too – however, we will pay special attention to cases related to machine learning).

The problem of maximizing or minimizing a function can be simply stated as max f(x) or min f(x). When there aren’t any constraints limiting possible values for x, it is widely known from calculus that the maximum or a minimum of such a function would occur when the first derivatives f’(x) of the function f(x) in respect to x are zero. Newton’s method is a common method for finding the roots of a differentiable function and is indeed a suitable method for finding those values such that first derivatives of a function are zero.

Example of a convex, but non-linear function f(x,y) = exp{-(x-1)²} + exp{-(y-2)²/2}. Images have been created using Wolfram Alpha.

However, while Newton’s method may work very well with functions of a single variable, the generalization to higher dimensions will require the replacement of the first derivative f’(x) with the function’s gradient vector g of partial derivatives, and the second derivative f’’(x) with the inverse Hessian matrix H-1. The problem is that the Hessian is a square matrix with as many rows and columns as parameters of our function f, and, besides being very costly and challenging to compute, even its size may be infeasible to accommodate in main memory when dealing with very high dimensional problems.

To overcome those limitations, several methods attempt to replace the Hessian matrix with an approximation extracted from the first partial derivatives alone. Those methods are called quasi-Newton methods, as they replace H in Newton’s method by an approximation instead of using the full Hessian. In case the function being optimized has any particular form or special characteristic which could be exploited, it is also possible to use more specialized methods such as the Gauss-Newton, Levenberg-Marquardt or even lower-bound approximation methods to avoid computing the full Hessian.

The quasi-Newton BFGS method for non-linear optimization builds an approximation for the Hessian matrix based on estimates extracted solely from first order information. However, even being cheaper to compute, the memory requirements are still high as the method still needs to accommodate a dense N x N matrix for the inverse Hessian approximation (where N is the number of variables for the function). To overcome this problem, the L-BFGS method, or the limited-memory version of the BFGS method, never forms or stores the approximation matrix, but only a few vectors which can be used to represent it implicitly.

## Source code

The code presented here is based upon a direct translation of the original code by Jorge Nocedal. His original code was released under the permissive BSD license, and honoring the original license and the author’s considerations, this code is released under the BSD as well.

The L-BFGS method is implemented within the BroydenFletcherGoldfarbShanno class. Upon object creation, it is possible to specify a function and its gradient through either the use of Lambda expressions or by specifying handles for named functions. After the object has been initialized, a simple call to Minimize runs the standard L-BFGS method until convergence. The details for the Minimize method is given below.

The version of the code detailed here only supports Minimization problems. However, it is possible to note that any minimization problem can be converted into a maximization problem and vice-versa by taking the opposite of the function and its gradient.

## Using the code

Code usage is rather simple. Suppose we would like to maximize the function g(x,y) = exp{-(x-1)²} + exp{-(y-2)²/2}:

Since we would like to perform a maximization, we first have to convert it to a minimization problem. The minimization version of the function is simply given by taking f(x,y) = –g(x,y):

Which is a convex function, as can be seen by plotting its surface. As it can be seen, the minimum of the function lies on the point (x,y) = (1,2). As expected, this point coincides with the roots of the partial derivative functions, as shown in the line plots below:

The minimization problem min f(x,y) = -(exp(-(x-1)²) + exp(-(y-2)²/2)), computed by Wolfram Alpha.

The roots of the partial derivatives in respective to x (left) and y (right). It is possible to see that the roots occur at 1 and 2, respectively. Images computed using Wolfram Alpha.

The first step in solving this optimization problem automatically is first to specify the target function. The target function f, in this case, can be specified through a lambda function in the form:

The next step is to specify the gradient g for the function f. In this particular case, we can manually compute the partial derivatives to be df/dx = -2 e^(-(x-1)^2) (x-1) and df/dy = -e^(-1/2 (y-2)^2) (y-2), in respect to x and y, respectively. Writing a lambda function to compute the gradient vector g, we have:

We can also note that this example was rather simple, so the gradient vector was easy to calculate. However, the gradient could also have been computed automatically using Accord.NET‘s FiniteDifferences class. In either case, all we have to do now is to create our L-BFGS solver and call its Minimize() method to begin optimization:

After the optimization is complete, the solution parameter will be available in the Solution property of the lbfgs object, and will be equal to { 1.0, 2.0 }. And also in case one is interested in progress reports (such as in the case of optimizing very large functions), it is also possible to register a listener to the Progress event handler. The complete version of the sample application accompanying the source code is given below:

## Conclusion

In this post, we showed how to use a reduced set of the Accord.NET Framework‘s to perform non-linear optimization. This routine is also used by the Conditional Random Fields and Hidden Conditional Random Fields trainers to optimize parameters for such models. The solver routines have been adapted from the original Fortran’s source code from Nocedal, which, tracing back to a 2001 message from the author, also have been reported to be available under the public domain. Nevertheless, the reduced set of the framework available for download within this post is available under a BSD license, as an alternative to the version available within the Accord.NET Framework which is available only under a LGPL license.

As always, I hope someone will find it useful 🙂

# Gaussian Mixture Models and Expectation-Maximization

Like K-Means, Gaussian Mixture Models (GMM) can be regarded as a type of unsupervised learning or clustering methods. They are among the most statistically mature methods for clustering. But unlike K-Means, GMMs are able to build soft clustering boundaries, i.e., points in space can belong to any class with a given probability.

The code presented here is part of the Accord.NET Framework. The Accord.NET Framework is a framework for developing machine learning, computer vision, computer audition, statistics and math applications in .NET. To use the framework in your projects, install it by typing Install-Package Accord.MachineLearning in your IDE’s NuGet package manager.

## Introduction

In statistics, a mixture model is a probabilistic model which assumes the underlying data to belong to a mixture distribution. In a mixture distribution, its density function is just a convex combination (a linear combination in which all coefficients or weights sum to one) of other probability density functions:

$p(x) = w_1 p_1(x) + w_2 p_2(x) + ... + w_n p_n(x)$

The individual pi(x) density functions that are combined to make the mixture density p(x) are called the mixture components, and the weights w1, w2, …, wn associated with each component are called the mixture weights or mixture coefficients.

The most common mixture distribution is the Gaussian (Normal) density function, in which each of the mixture components are Gaussian distributions, each with their own mean and variance parameters.

$p(x) = w_1 N(x \mid \mu_1,\Sigma_1) + w_2 N(x \mid \mu_2,\Sigma_2) + ... + w_n N(x \mid \mu_n,\Sigma_n)$

To give a more solid and concrete understanding of the Gaussian mixture models, we will be jumping directly on how to represent those abstractions in source code through the use of class diagrams. Hopefully class diagrams may give a better and direct overview on the subject than a lengthy theoretical discussion.

## Source code

First we will discuss about the characteristics of any probability distribution. The IDistribution interface (depicted below) says any probability distribution may have a probability function and a distribution function. It also says that probability distributions can also be fitted to sets of observations through a parameter estimation method.

Since distribution functions can be either univariate or multivariate, the methods above accept any number of scalar values as input through the use of the params keyword. The Fit method for parameter estimation also accepts a general System.Array because we could be given either an array of univariate observations in the form of a double[] or an array of multivariate observations in the form of a jagged double[][] array. Since each observation may have a different weight in the estimation process, the weights parameter can be used to inform those weights to the fitting method.

Probability distributions may also have some associated measures, mainly the Expectation and the Variance of the modeled random variable. Depending if the distribution is univariate or multivariate, the expected values and variances can be either a single scalar or a vector of scalars. For multivariate distributions the variance can be also represented by a symmetric, positive-definite matrix known as the variance-covariance matrix.

Moreover, probability distributions can also be continuous or discrete. In continuous distributions the probability function is known as the Probability Density Function (pdf), and in discrete distributions the probability function is known as the Probability Mass Function (pmf). However, since our distribution of interest, the Gaussian, is a continuous distribution, be focusing only on continuous multivariate distributions:

The Normal (Gaussian) distribution and the mixture distributions fall under the multivariate continuous distributions category and are implemented as such.

### Normal distributions

The Normal (or Gaussian) multivariate distribution is a multivariate distribution whose parameters are the mean vector μ and a variance-covariance matrix Σ. Its probability density function is given by:

$p(x) = \frac{1}{ (2\pi)^{k/2}|\Sigma|^{1/2} } \exp\!\Big( {-\tfrac{1}{2}}(x-\mu)' \Sigma^{-1} (x-\mu) \Big)$

The more detailed class diagram for the normal distribution shows the multivariate characteristics of most of its methods, in particular of the Fit method for estimating Gaussian parameters from a multivariate set of data. Each sample, given in the form of a double[], may also have different associated weights. In this case, the weight associated with each of the samples can be passed through the weights parameter. The source code for the Gaussian distribution can be found in the NormalDistribution.cs class.

To estimate the parameters of a Gaussian distribution all we have to do is compute the mean and the variance-covariance matrix out of the given data. This is a very straightforward procedure and can be done by using the standard methods of the Tools class in the Accord.Statistics namespace. The Fit method updates the distribution parameters, overwriting the current distribution.

### Mixture distributions

Mixture distributions are just convex combinations of other probability distributions. Thus said, mixture distributions have an array of component distributions and a coefficient array which contains the weights of the each of the component probability distributions.

In the generic implementation above, T is the type of the distributions used in the mixture. The most common mixture distribution, the Gaussian Mixture Distribution, could then be created by instantiating a Mixture object passing the initial Normal distributions as constructor parameters.

Alternatively, the parameters of the mixture distributions could be estimated from a set of observations by calling the Fit method. To estimate the parameters of a mixture distribution we will be using a common technique known as the Expectation-Maximization algorithm.

### Expectation Maximization

Expectation-Maximization (EM) is a well established maximum likelihood algorithm for fitting a mixture model to a set of training data. It should be noted that EM requires an a priori selection of model order, namely, the number of M components to be incorporated into the model. In the Accord.NET Framework, it can be called implicitly by using the Mixture.Fit method or explicitly using the ExpectationMaximization class. The source code for Expectation Maximization can be found in the ExpectationMaximization.cs file.

The general E-M algorithm is comprised of the following simple steps:

1. #### Initialization

Initialize the distribution parameters, such as the means, covariances and mixing coefficients and evaluate the initial value of the log-likelihood (the goodness of fit of the current distribution against the observation dataset)’;

2. #### Expectation

Evaluate the responsibilities (i.e. weight factors of each sample) using the current parameter values;

3. #### Maximization

Re-estimate the parameters using the responsibilities found in the previous step;

4. #### Repeat

Re-evaluate the log-likelihood and check if it has changed; if it has changed less than a given threshold, the algorithm has converged.

Below is the actual realization of the algorithm as implemented in the Fit method of the Mixture<T> class.

### Gaussian Mixture Models

Gaussian mixture models are among the most commonly used examples of mixture distributions. The GaussianMixtureModel class encompasses a Mixture<NormalDistribution> object and provides methods to learn from data and to perform actual classification through a simplified interface.

Moreover, a common problem which rises in mixture model fitting through E-M is the proper initialization of the mixture parameters. One popular approach is to initialize the mixture parameters using the centroids detected by the K-Means algorithm. The code below shows how the mixture parameters are computed by first creating a KMeans object and then passing the clusters to the mixture model for further processing.

## Using the code

Code usage is rather simple. First, instantiate a new GaussianMixtureModel object passing the desired number of components in the Gaussian mixture. Then, call the Compute(double[][], double) method passing the learning samples and a desired convergence threshold.

After training has been completed, a new sample can be classified by calling the Classify(double[]) method.

## Sample application

The accompanying sample application demonstrates how Gaussian Mixture Models can be used to cluster normally-distributed samples of data. By clicking the “Create random data” button, a given number of Normal distributions will be created in the two-dimensional space using random mean and covariance matrices.

After the data has been created, you can use the “Fit a Gaussian Mixture Model” button to fit a mixture of Gaussians to the data. Each of the Gaussians will receive a random color and the samples which have the greatest probability of belonging to any of the Gaussians will be colored accordingly.

Left: a random dataset containing three Gaussian clusters. Right: the clusters as identified by the Gaussian Mixture Model.

Left: a random dataset containing 10 Gaussian clusters. Right: the clusters as identified by the Gaussian Mixture Model.

## Conclusion

In this article we found how Gaussian Mixture Models can be successfully used to create soft clustering boundaries around data. Those soft boundaries are possible because in a mixture model each sample is said to belong to a cluster only within certain probability.

A main drawback of GMMs is that the number of Gaussian mixture components, as in K-Means, is assumed known as prior, so it cannot be considered as totally unsupervised clustering method. To aid in this situation, one could use additional algorithms such as the Minimum Message Length (MML) criteria.

Another problem arises with the correct initialization of the mixture components. A poor choice of initial parameters will invariably lead to poor results, as the E-M algorithm converges only to a local optimization point. A commonly used solution is initialization by randomly sampling in the mixture data. Other common solution, covered by the implementation presented here, is to use K-Means to perform initialization. However, K-Means itself may also converge to a poor solution and impact negatively in the mixture model fitting.

## References

• Raja, Y., Shaogang, G. Gaussian Mixture Models. Department of Computer Science, Queen Mary and Westfield College, England.
• Bishop, C. M. (2007), Pattern Recognition and Machine Learning (Information Science and Statistics), Springer.
• Wikipedia contributors, “Normal distribution,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Normal_distribution (accessed October 18, 2010).
• Wikipedia contributors, “Probability density function,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Probability_density_function (accessed October 18, 2010).
• Wikipedia contributors, “Mixture density,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Mixture_density (accessed October 18, 2010).

# Matrix manipulation using Accord.NET

Matrix manipulation in Accord.NET
is very straightforward: Just add a new using directive on top of your class to
have (literally) about a hundred new extension methods that operate directly on
.NET multi-dimensional arrays.

## Introduction

Accord.NET uses a bit different approach for matrix manipulation in contrast to
other libraries. By using C# 3.0 extension methods, Accord adds several of the standard
methods you would expect from a Matrix library,
such as linear system solving, matrix algebra and numerical decompositions directly
to the standard double[,] (or more generally T[,]) matrices of the framework
[^].

This approach offers some advantages since it avoids mismatches when one is using
multiple libraries, each with their own and often incompatible Matrix implementations.
Extending multi-dimensional arrays makes the use of matrices much more natural in
the .NET world, as no specialized Matrix classes have to be used and no code has
to be rewritten just to use a particular Matrix implementation.

Please note, however, that most methods implemented by Accord are not equivalent
to the heavily optimized versions of more specialized numerical packages, such as

BLAS
, from a performance view. Their real power comes when prototyping or realizing algorithms into code. Once an algorithm is written, several
unit tests can be created to test the correctness of the code. After that, because
we have an early working prototype, it will be much easier to perform optimizations
as needed. The key point is to have a working version early which can (if
actually necessary) be optimized later.

## Using extension methods

The first step is to include a new using directive on the top of

This is all that is necessary after you have referenced the Accord.Math library
in your project. Now you can use all of the methods and operations described below.
The following list is nowhere complete, but shows only some basics and highlights
from the current version of Accord.

## Declaring matrices

### Using standard .NET declarations

To declare matrices, no special code is required. Just create multi-dimensional
arrays as usual.

### Using Accord.NET extensions

Additionally, one may wish to use one of the convenience methods of Accord.NET to
create specialized matrices such as the Identity matrix, multiples of the Identity
matrix or Diagonal matrices.

Accord.NET (C#) MATLAB®

### Using Accord.NET extensions with implicit typing

By using implicit type variable declaration, the code acquires a certain lightweight
feeling and gets closer of its MATLAB®/Octave counterpart (MATLAB is a registered
.

Accord.NET (C#) MATLAB®

A much more closer alternative will be possible by using Algorithm Environments,
a upcoming feature in Accord. When available, this feature will allow construction
of mathematical code using

directly. The code will be based on current Octave syntax. Octave is a high-level
language, primarily intended for
numerical computations
, that is mostly compatible with
MATLAB
.

## Matrix operations

All standard matrix operations such as transpose, inverse, column and row manipulations
are available in the extension methods. In the example below, A is the same standard
double[,] matrix declared in the first section of this
article. All methods return a new double[,] matrix
as a result, leaving the original matrix A untouched.

Operation Accord.NET (C#) MATLAB®
Transpose
Inverse
Pseudo-Inverse

### Matrix Algebra

All common algebraic operations for matrices are also implemented. Those are the
common operations such as addition, subtraction and multiplication. Here, division
is actually a shortcut for multiplying by the inverse.

Operation Accord.NET (C#) MATLAB®
Subtraction
Multiplication
Division

The above also works with vectors and scalars.

Operations Accord.NET (C#) MATLAB®
Multiplying by a scalar
Multiplying by a column vector

### Special element-wise operations

Element-wise operations are operations performed on each element of the matrix or
vector. In Matlab, they are some times known as the dot operations, since they are
usually denoted by prefixing a dot on the common operators.

Operation Accord.NET (C#) MATLAB®
Multiplication
Division
Power

## Vector operations

Accord.NET can also perform many vector operations. Among them are the many flavors
of products between vectors, such as the inner, the outer and the Cartesian.

Operation Accord.NET (C#) MATLAB®
Inner product (a.k.a. the scalar product)
Outer product (a.k.a. the matrix product) w = u’*v
Cartesian product
Euclidean Norm
Sum
Product

## Matrix characteristics

Some common matrix characteristics, such as the determinant and trace, are readily
available.

Operation Accord.NET (C#) MATLAB®
Determinant
Trace

### Other characteristics

Other available characteristics are the Summation and Product of vector and matrix
elements.

Operation Accord.NET (C#) MATLAB®
Sum vector
Sum of elements
Sum along columns
Sum along rows

## Linear Algebra

Linear algebra
is certainly one of the most important fields of mathematics, if not the most important
one. Accord includes many methods for numerical linear algebra such as matrix inversion
and matrix decompositions. Most of them were originally based on JAMA and MAPACK, but today Accord has some additions from
routines translated from EISPACK (mainly the
Generalized Eigenvalue Decomposition
[^],
which is currently absent from Jama).

 Operation Accord.NET (C#) MATLAB® Solve a linear system x = A.Solve(B) x = A B

### Eigenvalue Decomposition

Operation Accord.NET (C#) MATLAB®
Standard decomposition
Generalized decomposition

### Singular Value Decomposition

Operation Accord.NET (C#) MATLAB®
Economy decomposition

### QR Decomposition

Operation Accord.NET (C#) MATLAB®
Standard decomposition

### Cholesky decomposition

Operation Accord.NET (C#) MATLAB®
Standard decomposition

### LU Decomposition

Operation Accord.NET (C#) MATLAB®
Standard decomposition

## Special operators

There are some other useful operators available in Accord.NET. There are facilities
to create index vectors (very common in Matlab for accessing sub-portions of a matrix),
select elements, find elements based on a selection criteria, and so on.

Operation Accord.NET (C#) MATLAB®
Create a vector of indices idx = 1:9
Selecting elements B = A(idx)
Finding elements matching a certain criteria (For example, finding x ∈ v / x > 2). v = [ 5 2 2 7 1 0];
idx = find(v > 2)
Reshaping a vector to a matrix. m = [1 2 3 4];
reshape(m,2,2)

## More than just matrices

Accord.NET also offers many other standard mathematical functions such as the Gamma
function, log-Gamma, Digamma, Bessel functions, Incomplete beta integrals, and so
on.

For a complete listing of the framework features, please check the project page at Origo. In particular, don’t forget to check
out the the class diagram for the Accord.Math namespace.

Cheers!

Legal notice: MATLAB is a registered trademark of The MathWorks, Inc. All
other trademarks are the property of their respective owners.

# Automatic Image Stitching with Accord.NET

I have just posted a new article on CodeProject, entitled Automatic Image Stitching with Accord.NET. It is a demonstration of automatic image stitching by interest point matching using the Accord and AForge.NET Frameworks.

The method used is pretty classic in the computer vision literature. There are many other variations of the method with their own advantages and disadvantages, but most of them are built around the same ideas – feature detection, matching and blending. This is one of the most straightforward and free implementations available, since some of those most sophisticated methods are patented.

# List of Windows Messages

Here is a list of all Windows Messages available in the form of a C# enumeration, just in case someone finds it useful.