Hidden Conditional Random Fields

If you are attempting to solve a sequence classification problem, perhaps you might be interested on learning about Hidden Conditional Random Fields (HCRFs). Those are the discriminant counterpart of  classifiers based on a set of hidden Markov models (HMMs).

Hidden Conditional Random Field
Credit: theaucitron, wheat fields.

Hidden what?

Hidden Markov models are simple models that can be created to recognize whether a sequence of observations is similar to the previous sequences that the model has seen before. However, if we create one HMM after each type of sequence that we are trying to distinguish, and them individually ask each model whether it recognizes the given sequence, we have just created a hidden Markov model classifier.

However, we might have a slightly better way of classifying those sequences. This method for creating a classifier (i.e. creating individual models to model each sequence type, then asking which model how strongly it recognizes a new sequence) is known as generative learning. But we could also have created a model from the ground-up that was just focused on distinguishing between sequence types without modeling them first. This would be known as discriminative learning.

And as mentioned in the tagline for this article, HCRFs are the discriminative doppelganger of the HMM classifier. Let’s see then how we can use them.

Creating HCRFs sequence classifiers in C#/.NET

If you would like to explore them in your projects, the Accord.NET Framework provides Hidden Markov Models, Hidden Markov Model Classifiers, Conditional Random Fields and Hidden Conditional Random Fields.

Because HCRFs are discriminative models, they can be optimized directly using gradient methods that resemble a lot what is normally done with Neural Networks. Examples include:

One of the best algorithms among those is Resilient Backpropagation [1]. The interesting part is that this is also one of the best algorithms for training Neural Networks.

Real-world example

For a real-world example on how HCRFs can be applied, I would kindly invite you to look at my master’s defense presentation. They detail exactly how to transform HMMs into linear-chain HCRFs through nice animations and then apply them for sign language recognition. Use mouse clicks to pass slides so you can see the animations!


Sign language recognition with Support Vector
Machines and Hidden Conditional Random Fields

Incidentally, the presentation above shows exactly two kinds of sequence recognition: discrete sequence recognition and multivariate/continuous sequence recognition.

The discrete case was the case of fingerspelling: a frame classifier translated images into symbols (i.e. numbers 0, 1, 2, … ), and then a sequence of those numbers was classified into a finite set of words. After, the continuous case was the case of natural word recognition: a feature extractor extracted dimensional features, including the (x,y) positions of the user’s hands and face, and then the sequence of those features were classified into a finite set of words.

Let’s consider concrete code examples on how those could have been implemented using the Accord.NET Framework, in C# inside .NET.

Simple discrete example

In this simple example we will consider only discrete data. See that this is the simplest case for a Hidden Markov Model (in discrete hidden Markov models we don’t have to worry about probability distribution assumptions for the model states). This example is also available at the Hidden Resilient Backpropagation Learning algorithm documentation page and also the article Sequence Classifiers Part II: Hidden Conditional Random Fields at CodeProject.

In this case, we can try to create a HCRF directly using a MarkovDiscreteFunction. There are many ways to create a HCRF and how to define its feature vectors. The framework provides some functions that are already specialized in handling some particular problems, such as for example, the same set of problems that would be solvable by a discrete HMM classifier.

Now that the model has been created, we can start training the model to recognize the sequences and their labels shown above.

After training has finished, we can ask the model which would be the most likely class labels for new sequences that we are about to show it. For example:

As you can see, we created a classification algorithm that knows how to differentiate between the different kinds of sequences we fed it in the beginning.

Complex continuous multivariate example

In this complex example we will consider multivariate real-valued data. This would be a better example for real-world problems. One particular problem where sequence recognition might be important is in the context of 3D gesture recognition. A 3D gesture is simply a sequence of 3D coordinates measured over time: if you move your head from the mouse up to your head, and we measured this process using a sensor, we would have a sequence of 3D points saying where your hand was at each step of the capturing process.

Furthermore, if you took 5 seconds to perform this movement, and the sensor captures one coordinate a second, we would have a sequence of 5 points describing your movement.

So, let’s say that we have one of such sensors, we would like to distinguish between sequences belonging to different movements. Let’s say we have the movement “hands-in-the-air“, “typing in a keyboard“, and “waving goodbye“.

Let’s say we decided to represent our frames as:

Each movement then would be described as sequence of those frames:

Let’s see a more concrete example. We have those sequences of 3D coordinates:

In this example, we are considering that we have only one sample of each gesture in our database. However, this is a big no-no. In practice, we should have hundreds of those samples for the magic to work.

Now let’s create our learning database. We have to create a table where we have all of our sequences or movements on the left side, and their corresponding output labels on the right.

Now, because we are dealing with multivariate samples (i.e. 3D points) we have to assume one shape for the states in our hidden Markov models. One common choice is to assume they are Gaussian. However to make it even more simpler, let’s assume they are *independent Gaussians*, it is, that the X coordinates follow a univariate Gaussian, the Y follow another, and the Z yet another, and yet they are all uncorrelated.

Ok, we are almost there! Let’s create our classifier using those base distributions:

Now one interesting detail is the choice for the Forward topology above. Topologies are different ways to organize our states in our model. In other words, it tells which kinds of sequences are allowed and which ones are impossible. Choosing a Forward topology is equivalent to saying: no movement may go back in time. Which is true, by the way. So we go with it.

Great! Now we have our model. We then need a way to teach it to recognize the sequences we defined in the beginning. Thanksgod the framework is flexible enough to support any kind of distribution you gave as the emission states. Most implementations can only do Gaussians, Gaussian Mixtures, and the like. But we, we can use anything that supports the IFittableDistribution interface: Bernoullis, Multinomials, Poissons

Now, since we are estimating things from data, it might be the case that at some point some of our Normal distribution is asked to fit a set of numbers that are all the same. In this case, this would lead to a Normal distribution with zero variance, which is clearly not allowed in the current rules of the universe. To overcome this problem, we can signal the fitting algorithm that some exceptions may apply by specified a FittingOptions object during the learning phase.

After a while, this method should return with a working model for us.

Which is very nice. In this particular example, we could have a perfect model that was able to correctly predict the database we threw at it. But what if this hasn’t been the case?

If this wasn’t the case, there is always room for improvement. Let’s digivolve our hidden Markov classifier into a Hidden Conditional Random Field using a Markov Multivariate Function, and see what we can do with it:

Although hidden Markov models can operate with any choice of distribution function, unfortunately for HCRFs things aren’t that simple. HCRF are very general models, and as such, they make very few restrictions on which formulation we would like to use. While this is great from a theoretical point of view, it means  that the framework cannot be as helpful as it was before and offer a generic support for all possible distributions in the world.

In this case, we can either create our own feature functions specifying how our HCRF should be created, or we could use a few built-in functions that are specialized for some key model formulations such as Gaussian HMMs, Gaussian Mixtures, Multivariate Gaussians, and so on.

Now, let’s check that we really didn’t loose anything by transforming one model into another.

However, here is where we will finally be able to apply discriminative learning to our previously generative-learned models. One of the best learning algorithms for HCRFs happens to be one of the best algorithms also available for Neural Networks. It is one of the fastest gradient algorithms restricted only to first-order information.

So let’s use it and see what happens:

At this point, the HCRF should have improved over the previous log-likelihood for the generative model, and will still be able to successfully distinguish between our three movement types:

As we could see, in this example we have created a HCRF classifier that is able to distinguish between sequences of multivariate observations (i.e. vectors).

See also

References

  1. Milind Mahajan, Asela Gunawardana, and Alex Acero. Training algorithms for hidden conditional random fields. International Conference on Acoustics, Speech, and Signal Processing. 2006.

3 Comments

  1. Dear Cesar,

    I’d like to thank you for your amazing tutorials 🙂

    I’ve a question is the implementation of the HCRF and CRF complete in the Accord Framework, specially CRF.

    I’ve read some papers that using CRF for Continous Gesture Recognition, where it’s using CRF for stream that contains gestures to segment out the known gestures from unknown gestures, after which we can use HMM or HCRF classifier to deal with the segmented gestures.

    Do you know how to go through implementing this?

    Thanks in advance

  2. Hi Omar!

    Unfortunately I have never implemented the stream HMM or CRF to segment time sequences. I have studied the topic some years ago and was heading into this direction, but in the end I decided to focus on other aspects of the gesture recognition problem and didn’t finish implementing the time segmentation part.

    In any way, the method that I was going to implement was a gesture segmentation network based on HMMs and threshold models. This method was described in the ’99 paper entitled “An HMM-based threshold model approach for gesture recognition” by Lee and Kim. I have implemented most of the machinery necessary for making those models work, including the threshold models and some on-line implementations for HMMs that you can find in the Accord.Statistics.Running namespace. I remember that I was able to combine them together in the way described in that paper, but I never finished that code so it could be added to the framework.

    I suppose the technique described in that paper would also work for CRFs and HCRFs.

    Best regards,
    Cesar

  3. In the first example what is the difference between states:3 and outputClasses:3?
    MarkovDiscreteFunction(states: 3, symbols: 4, outputClasses: 3)

    It seems both the states and outputClasses are the same.

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *