If you are attempting to solve a sequence classification problem, perhaps you might be interested on learning aboutÂ Hidden Conditional Random Fields (HCRFs). ThoseÂ are the discriminant counterpart ofÂ Â classifiersÂ based on a setÂ of hidden Markov models (HMMs).

## Hidden what?

Hidden Markov models are simple models that can be createdÂ to recognize whether a sequence of observations is similar to the previous sequences that theÂ model has seen before. However, ifÂ we create one HMMÂ after each type of sequenceÂ that we are trying to distinguish, and themÂ individually ask each model whether it recognizes the given sequence, we have just created a **hidden Markov model classifier**.

However, we might have a slightly better way of classifying those sequences.Â This method for creating a classifier (i.e. creating individual models to model each sequence type, then asking which model how strongly it recognizes a new sequence) is known asÂ **generative learning**. But we could also haveÂ created a model from the ground-up that was just focused on distinguishing between sequence types without modeling them first. This would be known asÂ **discriminative learning**.

And as mentioned in the tagline for this article, HCRFs are the discriminative doppelganger of the HMM classifier. Let’s see then how we can use them.

### CreatingÂ HCRFs sequence classifiers inÂ C#/.NET

If you would like to explore them in your projects, the Accord.NET Framework providesÂ Hidden Markov Models, Hidden Markov Model Classifiers, Conditional Random Fields and Hidden Conditional Random Fields.

Because HCRFs are discriminative models, they can be optimized directly using gradient methods that resemble a lot what is normally done with Neural Networks.Â Examples include:

One of the best algorithms among those is Resilient Backpropagation [1]. The interesting part is that this is also one of the best algorithms for training Neural Networks.

## Real-worldÂ example

For a real-world example on how HCRFs can be applied, I would kindly invite you toÂ look at my master’s defense presentation. They detail exactly how to transform HMMs into linear-chain HCRFs through nice animations and then apply them for sign language recognition. Use **mouseÂ clicks** to pass slides so you can see the animations!

Sign language recognition with Support Vector

Machines and Hidden Conditional Random Fields

Incidentally, the presentation above shows exactly twoÂ kinds of sequence recognition: discrete sequence recognition and multivariate/continuous sequence recognition.

The **discrete case** was the case ofÂ fingerspelling:Â a frame classifier translatedÂ images into symbols (i.e. numbers 0, 1, 2, … ), and then aÂ sequence of those numbersÂ wasÂ classified into a finite set of words. After, theÂ **continuous case** was the case ofÂ natural word recognition:Â a feature extractor extracted dimensional features, including the (x,y) positions ofÂ the user’s hands and face,Â and then the sequence of those features were classified into aÂ finite set of words.

Let’s considerÂ concrete code examples on how those could have been implemented using the Accord.NETÂ Framework, in C# insideÂ .NET.

## Simple discrete example

In this simple example we will considerÂ only discrete data. See that this is the simplest case for a Hidden Markov ModelÂ (in discrete hidden Markov models we don’t have to worry about probability distribution assumptions for the model states).Â This example is also available at the HiddenÂ Resilient Backpropagation Learning algorithm documentationÂ pageÂ and also the article Sequence Classifiers Part II: Hidden Conditional Random Fields at CodeProject.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
// Suppose we would like to learn how to classify the // following set of sequences among three class labels: int[][] inputs = { // First class of sequences: starts and // ends with zeros, ones in the middle: new[] { 0, 1, 1, 1, 0 }, new[] { 0, 0, 1, 1, 0, 0 }, new[] { 0, 1, 1, 1, 1, 0 }, // Second class of sequences: starts with // twos and switches to ones until the end. new[] { 2, 2, 2, 2, 1, 1, 1, 1, 1 }, new[] { 2, 2, 1, 2, 1, 1, 1, 1, 1 }, new[] { 2, 2, 2, 2, 2, 1, 1, 1, 1 }, // Third class of sequences: can start // with any symbols, but ends with three. new[] { 0, 0, 1, 1, 3, 3, 3, 3 }, new[] { 0, 0, 0, 3, 3, 3, 3 }, new[] { 1, 0, 1, 2, 2, 2, 3, 3 }, new[] { 1, 1, 2, 3, 3, 3, 3 }, new[] { 0, 0, 1, 1, 3, 3, 3, 3 }, new[] { 2, 2, 0, 3, 3, 3, 3 }, new[] { 1, 0, 1, 2, 3, 3, 3, 3 }, new[] { 1, 1, 2, 3, 3, 3, 3 }, }; // Now consider their respective class labels int[] outputs = { /* Sequences 1-3 are from class 0: */ 0, 0, 0, /* Sequences 4-6 are from class 1: */ 1, 1, 1, /* Sequences 7-14 are from class 2: */ 2, 2, 2, 2, 2, 2, 2, 2 }; |

In this case, we can try to create a HCRF directly using a **MarkovDiscreteFunction**. There are many ways to create a HCRF and how to define its feature vectors. The framework provides some functions that are already specialized in handling some particular problems, such as for example, the same set of problems that would be solvable by a discrete HMM classifier.

1 2 3 |
// Create the Hidden Conditional Random Field using a set of discrete features var function = new MarkovDiscreteFunction(states: 3, symbols: 4, outputClasses: 3); var classifier = new HiddenConditionalRandomField<int>(function); |

Now that the model has been created, we can start training the model to recognize the sequences and their labels shown above.

1 2 3 4 5 6 7 8 |
// Create a learning algorithm var teacher = new HiddenResilientGradientLearning<int>(classifier) { Iterations = 50 }; // Run the algorithm and learn the models teacher.Run(inputs, outputs); |

After training has finished, we can ask the model which would be the most likely class labels for new sequences that we are about to show it. For example:

1 2 3 4 5 6 7 8 |
int y1 = classifier.Compute(new[] { 0, 1, 1, 1, 0 }); // output is y1 = 0 int y2 = classifier.Compute(new[] { 0, 0, 1, 1, 0, 0 }); // output is y1 = 0 int y3 = classifier.Compute(new[] { 2, 2, 2, 2, 1, 1 }); // output is y2 = 1 int y4 = classifier.Compute(new[] { 2, 2, 1, 1 }); // output is y2 = 1 int y5 = classifier.Compute(new[] { 0, 0, 1, 3, 3, 3 }); // output is y3 = 2 int y6 = classifier.Compute(new[] { 2, 0, 2, 2, 3, 3 }); // output is y3 = 2 |

As you can see, we created a classification algorithm that knows how to differentiate between the different kinds of sequences we fed it in the beginning.

## Complex continuous multivariate example

In this complex example we will consider multivariate real-valued data. This would be a better example for real-world problems. One particular problem where sequence recognition might be important is in the context of 3D gesture recognition.Â A 3D gesture isÂ simply a sequence of 3D coordinates measured over time: if you moveÂ your head from the mouse up to your head, and we measured this processÂ usingÂ aÂ sensor, we would haveÂ a sequence of 3D points saying where your hand was at each step of the capturing process.

Furthermore, if you took 5 seconds to perform this movement, and the sensor captures one coordinate a second, we would have a sequence of 5 points describing your movement.

So, let’s say that we have one of such sensors, we would like to distinguish betweenÂ sequences belonging to different movements. Let’s say we have the movement “**hands-in-the-air**“, “**typing in a keyboard**“, and “**waving goodbye**“.

Let’s say we decided to represent our frames as:

1 |
double[] frame = { x, y, }; |

Each movement then would be described as sequence of those frames:

1 |
double[][] movement = { frame1, frame2, frame3 }; |

Let’s see a more concrete example. We have thoseÂ sequences of 3D coordinates:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
double[][] hands_in_air = { new double[] { 1.0, 0.1, 0.0 }, // this movement new double[] { 0.0, 1.0, 0.1 }, // took 6 frames new double[] { 0.0, 1.0, 0.1 }, // to be recorded. new double[] { 0.0, 0.0, 1.0 }, new double[] { 0.0, 0.0, 1.0 }, new double[] { 0.0, 0.0, 0.1 }, // 6 frames }; double[][] typing = { new double[] { 0.0, 0.0, 0.0 }, // the typing new double[] { 0.1, 0.0, 1.0 }, // took only 4. new double[] { 0.0, 0.0, 0.1 }, new double[] { 1.0, 0.0, 0.0 }, }; double[][] waving = { new double[] { 0.0, 0.0, 1.0, 0.0 }, // same for the new double[] { 0.1, 0.0, 1.0, 0.1 }, // waving goodbye. new double[] { 0.0, 0.1, 1.0, 0.0 }, new double[] { 0.1, 0.0, 1.0, 0.1 }, }; |

In this example, we are considering that we have only one sample of each gesture in our database. However, **this is a big no-no**. In practice, we should have *hundreds* of those samples for the magic to work.

Now let’s create our learning database. WeÂ have to create a table where we have all of our sequences or movements on the left side, and their corresponding output labels on the right.

1 2 |
// Those are the movementsÂ we want to distinguish: double[][][] = { hello, car, wardrobe }; |

1 2 |
//Â Those are their labels int[] labels = { 0, 1, 2 }; |

Now, because we are dealing with multivariate samples (i.e. 3D points) we have to assume one shape for the states in our hidden Markov models. One commonÂ choice is to assume they are Gaussian. However to make it even more simpler, let’s assume they are *independent Gaussians*, it is, that the **X** coordinatesÂ follow a univariate Gaussian, the **Y** follow another, and the **Z** yet another, and yet they are all uncorrelated.

1 2 3 4 5 6 7 |
var initial = new Independent<NormalDistribution> ( new NormalDistribution(0, 1), new NormalDistribution(0, 1), new NormalDistribution(0, 1), new NormalDistribution(0, 1) // PS: You can also mix a GeneralDiscreteDistribution here ); |

Ok, we are almost there! Let’sÂ create our classifier using those base distributions:

1 2 3 4 5 6 7 8 9 |
int numberOfWords = 3; // we are trying to distinguish between 3 movement types int numberOfStates = 5; // this value can be found by trial-and-error, but 5 is magic var hmm = new HiddenMarkovClassifier<Independent<NormalDistribution>> ( Â Â classes: numberOfWords, Â Â topology: new Forward(numberOfStates), // important for time Â Â initial: initial ); |

Now one interesting detail is the choice for the Forward topology above. Topologies are different ways to organize our states in our model. In other words, it tells which kinds of sequences are allowed and which ones are impossible. Choosing a Forward topology is equivalent to saying: *no movement may go back in time*. Which is true, by the way. So we go with it.

Great! Now we have our model. WeÂ then need a way to teach it to recognize the sequences we defined in the beginning. *Thanksgod* the framework is flexible enough to supportÂ **any** kind of distribution you gaveÂ as the emission states. Most implementations can only do Gaussians, Gaussian Mixtures, and the like. But we, we canÂ use anything thatÂ supports the IFittableDistribution interface: Bernoullis, Multinomials, Poissons…

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
// Create a new learning algorithm to train the sequence classifier var teacher = new HiddenMarkovClassifierLearning<Independent<NormalDistribution>>(hmm, // Train each model until the log-likelihood changes less than 0.001 modelIndex => new BaumWelchLearning<Independent<NormalDistribution>>(hmm.Models[modelIndex]) { Tolerance = 0.001, Iterations = 100, // This is necessary so the code doesn't blow up when it realize // there is only one sample per word class. But this could also be // needed in normal situations as well. // FittingOptions = new IndependentOptions() { InnerOption = new NormalOptions() { Regularization = 1e-5 } } }); |

Now, since we are estimating things from data, it might be the case that at some point some of our Normal distribution is asked to fit a set of numbers that are all the same. In this case, this wouldÂ lead toÂ a Normal distribution withÂ **zeroÂ variance**, which is clearly not allowed in the current rules of the universe. To overcome this problem, weÂ can signal the fitting algorithm that some exceptions may apply by specified a FittingOptions object during the learning phase.

1 2 |
// Finally, we can run the learning algorithm! double logLikelihood = teacher.Run(movements, labels); |

After a while, this method should return with a working model for us.

1 2 3 4 5 6 |
// At this point, the classifier should be successfully // able to distinguish between our three word classes: // int = hmm.Compute(hello); Â Â Â Â Â Â Â // should output 0 int = hmm.Compute(car); Â Â Â Â Â Â Â Â // should output 1 int = hmm.Compute(wardrobe); Â Â // should output 2 |

Which is very nice. In this particular example, we could have a perfect model that was able to correctly predict the database we threw at it. But what if this hasn’t been the case?

If this wasn’t the case, there is always room for improvement. Let’s digivolve our hidden Markov classifier into a Hidden Conditional Random Field using a Markov Multivariate Function, and see what we can do with it:

1 2 3 |
// Now, we can use the Markov classifier to initialize a HCRF var function = new MarkovMultivariateFunction(hmm); var hcrf = new HiddenConditionalRandomField<double[]>(function); |

Although hidden Markov models can operate with any choice of distribution function, unfortunately for HCRFs things aren’t that simple. HCRF are very general models, and as such, they make very few restrictionsÂ on which formulation we would like to use. While this is great from a theoretical point of view, it meansÂ Â thatÂ the framework cannot be as helpful as it was before and offer a generic support for all possible distributions in the world.

In this case, we can either create our own feature functions specifying how our HCRF should be created, or we could use a few built-in functionsÂ that are specialized for some key model formulations such as GaussianÂ HMMs, Gaussian Mixtures, Multivariate Gaussians, and so on.

Now, let’s check that we really didn’t loose anything by transforming one model into another.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
// We can check that both are equivalent for (int i = 0; i < words.Length; i++) { // Should be the same int expected = hmm.Compute(movements[i]); int actual = hcrf.Compute(movements[i]); // Should be the same double h0 = hmm.LogLikelihood(movements[i], 0); double c0 = hcrf.LogLikelihood(movements[i], 0); double h1 = hmm.LogLikelihood(movements[i], 1); double c1 = hcrf.LogLikelihood(movements[i], 1); double h2 = hmm.LogLikelihood(movements[i], 2); double c2 = hcrf.LogLikelihood(movements[i], 2); } |

However, here isÂ where we will finally be able to apply discriminative learning to our previously generative-learned models. One of the best learning algorithms for HCRFs happens to be one of the best algorithms also available for Neural Networks. It is one of the fastest gradient algorithms restricted only to first-order information.

So let’s use it and see what happens:

1 2 3 4 5 6 7 8 9 10 11 12 |
// Now we can learn the HCRF using one of the best learning // algorithms available, Resilient Backpropagation learning: // Create a learning algorithm var rprop = new HiddenResilientGradientLearning<double[]>(hcrf) { Iterations = 50, Tolerance = 1e-5 }; // Run the algorithm and learn the models double error = rprop.Run(words, labels); |

At this point, the HCRF should have improved over the previous log-likelihood for the generative model, and will still be able to successfully distinguish between our three movement types:

1 2 3 |
int hc1 = hcrf.Compute(hands_in_air); int hc2 = hcrf.Compute(typing); int hc3 = hcrf.Compute(waving); |

As we could see, in this example we have created a HCRF classifier that is able to distinguish between sequences of multivariate observations (i.e. vectors).

## See also

- Accord.NET Framework for Machine Learning and Statistics
- Sequence Classifiers in C# – Part I: Hidden Markov Models
- Sequence Classifiers in C# – Part II: Hidden Conditional Random Fields

## References

- Milind Mahajan, Asela Gunawardana, and Alex Acero.Â Training algorithms for hidden conditional random fields.Â International Conference on Acoustics, Speech, and Signal Processing. 2006.

Dear Cesar,

I’d like to thank you for your amazing tutorials ðŸ™‚

I’ve a question is the implementation of the HCRF and CRF complete in the Accord Framework, specially CRF.

I’ve read some papers that using CRF for Continous Gesture Recognition, where it’s using CRF for stream that contains gestures to segment out the known gestures from unknown gestures, after which we can use HMM or HCRF classifier to deal with the segmented gestures.

Do you know how to go through implementing this?

Thanks in advance

Hi Omar!

Unfortunately I have never implemented the stream HMM or CRF to segment time sequences. I have studied the topic some years ago and was heading into this direction, but in the end I decided to focus on other aspects of the gesture recognition problem and didn’t finish implementing the time segmentation part.

In any way, the method that I was going to implement was a gesture segmentation network based on HMMs and threshold models. This method was described in the ’99 paper entitled “An HMM-based threshold model approach for gesture recognition” by Lee and Kim. I have implemented most of the machinery necessary for making those models work, including the threshold models and some on-line implementations for HMMs that you can find in the Accord.Statistics.Running namespace. I remember that I was able to combine them together in the way described in that paper, but I never finished that code so it could be added to the framework.

I suppose the technique described in that paper would also work for CRFs and HCRFs.

Best regards,

Cesar

In the first example what is the difference between states:3 and outputClasses:3?

MarkovDiscreteFunction(states: 3, symbols: 4, outputClasses: 3)

It seems both the states and outputClasses are the same.

Thanks