# Harris Corners Detector in C#

Corner detection, or the more general terminology interest point detection, is an approach used within computer vision systems to extract certain kinds of features from an image. Corner detection is frequently used in motion detection, image matching, tracking, image mosaicing, panorama stitching, 3D modelling and object recognition.

This code has also been incorporated in Accord.NET Framework, which includes the latest version of this code plus many other statistics and machine learning tools. In order to install the Accord.NET Framework in your projects, use NuGet. Type in the package manager: Install-Package Accord.Imaging

## Introduction

One of the firsts operators for interest point detection was developed by Hans P. Moravec in 1977 for his research involving the automatic navigation of a robot through a clustered environment. It was also Moravec who defined the concept of “points of interest” in a image and concluded these interest points could be used to find matching regions in different images.

The Moravec operator is considered to be a corner detector because it defines interest points as points where there are large intensity variations in all directions. This often is the case at corners. It is interesting to note, however, that Moravec was not specifically interested in finding corners, just distinct regions in an image that could be used to register consecutive image frames.

### The Harris Operator

This operator was developed by Chris Harris and Mike Stephens in 1988 as a processing step to build interpretations of a robot’s environment based on image sequences. Like Moravec, they needed a method to match corresponding points in consecutive image frames, but were interested in tracking both corners and edges between frames.

Harris and Stephens improved upon Moravec’s corner detector by considering the differential of the corner score with respect to direction directly. The Harris corner detector computes the locally averaged moment matrix computed from the image gradients, and then combines the Eigenvalues of the moment matrix to compute a corner measure, from which maximum values indicate corners positions.

## Source Code

Below is the source code for the Harris Corners Detector algorithm. This is mostly the same code implemented in the Accord.NET Framework.

## Using the code

Code usage is very simple. A Corners Marker filter from the AForge Framework can be used to directly draw the detected interest points in the original image as it would be usual within the framework.

## Sample application

The accompanying sample application is pretty much self-explanatory. It performs the corners detection in the famous Lena Söderberg‘s picture, but can be adapted to work with other pictures as well. Just add another image to the project settings’ resources and change the line of code which loads the bitmap.

It is also very straightforward to adapt the application to load arbitrary images from the hard disk. I have opted to leave it this way for simplicity.

# Hidden Markov Model -Based Sequence Classifiers in C#

Distinct Hidden Markov Models can be trained individually on data obtained from each class of a classification problem. If we create a set of those models and specialize each model to recognize each of the separate classes, we can then use the HMMs ability to calculate the likelihood that a given sequence belongs to the model to determine the most likely class of an unknown sequence.

The code presented here is also part of the Accord.NET Framework. The Accord.NET Framework is a framework for developing machine learning, computer vision, computer audition, statistics and math applications. It is based on the already excellent AForge.NET Framework. Please see the starting guide for mode details. The latest version of the framework includes the latest version of this code plus many other statistics and machine learning tools.

## Introduction

### Hidden Markov Models

An introduction about Hidden Markov Models (HMM) was presented in a previous article, entitled Hidden Markov Models in C#. HMMs are models are stochastic methods to model temporal and sequence data.

They allow, among other things, (1) to infer the most likely sequence of states that produced a given output sequence, to (2) infer which will be the most likely next state (and thus predicting the next output) and (3) calculate the probability that a given sequence of outputs originated from the system (allowing the use of hidden Markov models for sequence classification). In the context of this article, we will be more interested in results from ability (3).

Hidden Markov Models can be seem as finite state machines where for each sequence unit observation there is a state transition and, for each state, there is a output symbol emission. The picture on the left summarizes the overall definition of a HMM given in the previous article.

### Hidden Markov Model -Based Sequence Classifier

Hidden Markov Models can be trained individually on data obtained from each class of a classification problem. If we create a set of models and specialize each model to recognize each of the separated classes, then we will be able to explore the HMMs ability to calculate the likelihood that a given sequence belongs to itself to perform classification of discrete sequences.

After all models have been trained, the probability of the unknown-class sequence can be computed for each model. As each model specialized in a given class, the one which outputs the highest probability can be used to determine the most likely class for the new sequence, as shown in the picture below.

Schematic representation of the sequence classification procedure.

## Source Code

The implementation is very straightforward, as shown in the picture below. The Tag property of the HiddenMarkovModel class can be used to store additional information about which class the model will be representing.

Class diagram for the HMM-based sequence classifier.

### Computing the most likely class for a given sequence

Computing the most likely class for a given sequence is as simple as iterating through all models in the set and selecting the one which has produced the highest likelihood value.

### Training each model to recognize each of the output classes

To train each model to recognize each of the output classes we have to split the training data into subsets whose outputs corresponds to each of the classes from the classification problem.

## Using the Code

Code usage is very simple. Once one has the input data available in the form of a sequence array and output data in form of a integer array, create a new HiddenMarkovClassifier object with the appropriate parameters.

Then, just call Learn() passing the input and output data and the convergence limit for the learning algorithm. After that, call Compute() passing a new integer sequence to be classified by the system.

## Sample Application

The accompanying sample application can read Excel spreadsheets containing integer sequences and labels for those sequences. An example spreadsheet contained inside the Resources folder in the compressed archive file demonstrates the expected format for this data.

Left: Input sequences data. Right: Hidden Markov Models contained in the Sequence Classifier with initial configurations.

Left: Trained models. Right: Performance test for the Hidden Markov Model Sequence Classifier.

# Hidden Markov Models in C#

Hidden Markov Models (HMM) are stochastic methods to model temporal and sequence data. They are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.

The code presented here is also part of the Accord.NET Framework. The Accord.NET Framework is a framework for developing machine learning, computer vision, computer audition, statistics and math applications. It is based on the already excellent AForge.NET Framework. Please see the starting guide for mode details. The latest version of the framework includes the latest version of this code plus many other statistics and machine learning tools.

## Introduction

Hidden Markov Models were first described in a series of statistical papers by Leonard E. Baum and other authors in the second half of the 1960s. One of the first applications of HMMs was speech recognition, starting in the mid-1970s. Indeed, one of the most comprehensive explanations on the topic was published in “A Tutorial On Hidden Markov Models And Selected Applications in Speech Recognition”, by Lawrence R. Rabiner in 1989. In the second half of the 1980s, HMMs began to be applied to the analysis of biological sequences, in particular DNA. Since then, they have become ubiquitous in the field of bioinformatics.

Dynamical systems of discrete nature assumed to be governed by a Markov chain emits a sequence of observable outputs. Under the Markov assumption, it is also assumed that the latest output depends only on the current state of the system. Such states are often not known from the observer when only the output values are observable.

Hidden Markov Models attempt to model such systems and allow, among other things, (1) to infer the most likely sequence of states that produced a given output sequence, to (2) infer which will be the most likely next state (and thus predicting the next output) and (3) calculate the probability that a given sequence of outputs originated from the system (allowing the use of hidden Markov models for sequence classification).

The “hidden” in Hidden Markov Models comes from the fact that the observer does not know in which state the system may be in, but has only a probabilistic insight on where it should be.

## Definition

Hidden Markov Models can be seem as finite state machines where for each sequence unit observation there is a state transition and, for each state, there is a output symbol emission.

### Notation

Traditionally, HMMs have been defined by the following quintuple:

$\lambda = (N, M, A, B, \pi)$

where

• N is the number of states for the model
• M is the number of distinct observations symbols per state, i.e. the discrete alphabet size.
• A is the NxN state transition probability distribution given in the form of a matrix A = {aij}
• B is the NxM observation symbol probability distribution given in the form of a matrix B = {bj(k)}
• π is the initial state distribution vector π = {πi}

Note that, if we opt out the structure parameters M and N we have the more often used compact notation

$\lambda = (A, B, \pi)$

### Canonical problems

There are three canonical problems associated with hidden Markov models, which I’ll quote from Wikipedia:

1. Given the parameters of the model, compute the probability of a particular output sequence. This requires summation over all possible state sequences, but can be done efficiently using the Forward algorithm, which is a form of dynamic programming.
2. Given the parameters of the model and a particular output sequence, find the state sequence that is most likely to have generated that output sequence. This requires finding a maximum over all possible state sequences, but can similarly be solved efficiently by the Viterbi algorithm.
3. Given an output sequence or a set of such sequences, find the most likely set of state transition and output probabilities. In other words, derive the maximum likelihood estimate of the parameters of the HMM given a dataset of output sequences. No tractable algorithm is known for solving this problem exactly, but a local maximum likelihood can be derived efficiently using the Baum-Welch algorithm or the Baldi-Chauvin algorithm. The Baum-Welch algorithm is an example of a forward-backward algorithm, and is a special case of the Expectation-maximization algorithm.

The solution for those problems are exactly what makes Hidden Markov Models useful. The ability to learn from the data (using the solution of problem 3) and then become able to make predictions (solution to problem 2) and able to classify sequences (solution of problem 2) is nothing but applied machine learning. From this perspective, HMMs can just be seem as supervisioned sequence classifiers and sequence predictors with some other useful interesting properties.

### Choosing the structure

Choosing the structure for a hidden Markov model is not always obvious. The number of states depend on the application and to what interpretation one is willing to give to the hidden states. Some domain knowledge is required to build a suitable model and also to choose the initial parameters that an HMM can take. There is also some trial and error involved, and there are sometimes complex tradeoffs that have to be made between model complexity and difficulty of learning, just as is the case with most machine learning techniques.

Additional information can be found on http://www.cse.unsw.edu.au/~waleed/phd/tr9806/node12.html.

## Algorithms

The solution to the three canonical problems are the algorithms that makes HMMs useful. Each of the three problems are described in the three subsections below.

### Evaluation

The first canonical problem is the evaluation of the probability of a particular output sequence. It can be efficiently computed using either the Viterbi-forward or the Forward algorithms, both of which are forms of dynamic programming.

The Viterbi algorithm originally computes the most likely sequence of states which has originated a sequence of observations. In doing so, it is also able to return the probability of traversing this particular sequence of states. So to obtain Viterbi probabilities, please refer to the Decoding problem referred below.

The Forward algorithm, unlike the Viterbi algorithm, does not find a particular sequence of states; instead it computes the probability that any sequence of states has produced the sequence of observations. In both algorithms, a matrix is used to store computations about the possible state sequence paths that the model can assume. The forward algorithm also plays a key role in the Learning problem, and is thus implemented as a separate method.

### Decoding

The second canonical problem is the discovery of the most likely sequence of states that generated a given output sequence. This can be computed efficiently using the Viterbi algorithm. A trackback is used to detect the maximum probability path travelled by the algorithm. The probability of travelling such sequence is also computed in the process.

### Learning

The third and last problem is the problem of learning the most likely parameters that best models a system given a set of sequences originated from this system. Most implementations I’ve seem did not consider the problem of learning from a set of sequences, but only from a single sequence at a time. The algorithm below, however, is fully suitable to learn from a set of sequences and also uses scaling, which is another thing I have not seem in other implementations.

The source code follows the original algorithm by Rabiner (1989). There are, however, some known issues with the algorithms detailed in Rabiner’s paper. More information about those issues is available in a next section of this article entitled “Remarks”.

## Using the code

Lets suppose we have gathered some sequences from a system we wish to model. The sequences are expressed as a integer array such as:

For us, it can be obvious to see that the system is outputting sequences that always start with a zero and have one or more ones at the end. But lets try to fit a Hidden Markov Model to predict those sequences.

Once the model is trained, lets test to see if it recognizes some sequences:

Of course the model performs well as this a rather simple example. A more useful test case would consist of allowing for some errors in the input sequences in the hope that the model will become more tolerant to measurement errors.

We can see that, despite having a very low probability, the likelihood values for the sequences containing a simulated measurement error are greater than the likelihoods for the sequences which do not follow the sequence structure at all.

In a subsequent article, we will see that those low values for the likelihoods will not be a problem because HMMs are often used in sets to form sequence classifiers. When used in such configurations, what really matters is which HMM returns the highest probability among others in the set.

## Remarks

A practical issue in the use of Hidden Markov Models to model long sequences is the numerical scaling of conditional probabilities. The probability of observing a long sequence given most models is extremely small, and the use of these extremely small numbers in computations often leads to numerical instability, making application of HMMs to genome length sequences quite challenging.

There are two common approaches to dealing with small conditional probabilities. One approach is to rescale the conditional probabilities using carefully designed scaling factors, and the other approach is to work with the logarithms of the conditional probabilities. For more information on using logarithms please see the work entitled “Numerically Stable Hidden Markov Model Implementation”, by Tobias P. Mann.

### Known issues

The code on this article is based on the Tutorial by Rabiner. There are, however, some problems with the scaling and other algorithms. An errata depicting all issues is available in the website “An Erratum for ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition’” and is maintained by Ali Rahimi. I have not yet verified if the implementation presented here also suffers from the same mistakes explained there. This code has worked well under many situations, but I cannot guarantee its perfectness. Please use at your own risk.

## Acknowledgements

Thanks to Guilherme C. Pedroso, for the help with the Baum-Welch generalization for multiple input sequences. He has also co-written a very interesting article using hidden Markov models for gesture recognition, entitled “Automatic Recognition of Finger Spelling for LIBRAS based on a Two-Layer Architecture” published in the 25th Symposium On Applied Computing (ACM SAC 2010).

# List of Windows Messages

Here is a list of all Windows Messages available in the form of a C# enumeration, just in case someone finds it useful.

It seems there is a minor bug in Google Chrome. Apparently, you can’t click links that are in the same horizontal line as the status bar that pops up from the bottom left corner.

For example, clicking the link marked with a red circle in the image below does nothing.

Well, this obviously isn’t a show stopper, but I think its a bug nevertheless.

# Logistic Regression in C#

Logistic regression (sometimes called the logistic model or logit model) is used for prediction of the probability of occurrence of an event by fitting data to a logistic curve. Logistic regression is used extensively in the medical and social sciences as well as marketing applications such as prediction of a customer’s propensity to purchase a product or cease a subscription.

The code presented here is part of the Accord.NET Framework. The Accord.NET Framework is a framework for developing machine learning, computer vision, computer audition, statistics and math applications in .NET. To use the framework in your projects, install it by typing Install-Package Accord.MachineLearning in your IDE’s NuGet package manager.

## Overview

The Logistic regression is a generalized linear model used for binomial regression. Like many forms of regression analysis, it makes use of several predictor variables that may be either numerical or categorical.

### Logistic Sigmoid Function

The logistic sigmoid function is given by

g(z) = 1 / (1 + Exp(-z))

where in the context of logistical regression z is called the logit.

### Logistic Regression Model

The logistic regression model is a generalized linear model. This means that it is just a linear regression model taken as input for a non-linear link function. The linear model has the form

z = c1x1 + c2x2 + … cnxn + i = ct x + i

where c is the coefficient vector, i is the intercept value and x is the observation vector for n variables and in the context of logistic regression is called the logit. The logit is then applied as input for the nonlinear logistic sigmoid function g(z), giving as result a probability.

So in a binomial problem where we are trying to determine whether a observation belongs to class C1 or class C2, the logistic model tell us that:

p(C1|x) = g(ct x + i)

p(C2|x) = 1 – p(C1|x)

where p(C1|x) denotes the probability of C1 being true when x is true.
In other words, denotes the probability of x belonging to class C1.

### Coefficients

The coefficients for the logistic regression are the values which multiply each observation variable from a sample input vector. The intercept is analogous to the independent term in a polynomial linear regression model or the threshold or bias value for a neuron in a neural network model.

### Odds Ratio

After computation of the logistic regression model, every coefficient will have an associated measure called the odds ratio. The odds ratio is a measure of effect size, describing the strength of association or non-independence between two binary data values and can be approximated by raising the coefficient value to the euler’s number.

Odds ratioc = ec

### Standard Error

The standard error for the coefficients can be retrieved from the inverse Hessian matrix calculated during the model fitting phase and can be used to give confidence intervals for the odds ratio. The standard error for the i-th coefficient of the regression can be obtained as:

SEi = sqrt(diag(H-1)i)

### Confidence Intervals

The confidence interval around the logistic regression coefficient is plus or minus 1.96*SEi, where SEi is the standard error for coefficient i. We can then define:

95% C.I.i = <lower, upper> = <coefficienti – 1.96 * SEi,  coefficienti + 1.96 * SEi>

### Wald Statistic and Wald’s Test

The Wald statistic is the ratio of the logistic coefficient to its standard error. A Wald test is used to test the statistical significance of each coefficient in the model. A Wald test calculates a Z statistic, which is:

z = coefficient / standard error

This z value can be squared, yielding a Wald statistic with a chi-square distribution, or, al