Hidden Markov Model -Based Sequence Classifiers in C#


Distinct Hidden Markov Models can be trained individually on data obtained from each class of a classification problem. If we create a set of those models and specialize each model to recognize each of the separate classes, we can then use the HMMs ability to calculate the likelihood that a given sequence belongs to the model to determine the most likely class of an unknown sequence. 

This post is a continuation for the previous
article, Hidden Markov Models in C#.


The code presented here is also part of the Accord.NET Framework. The Accord.NET Framework is a framework for developing machine learning, computer vision, computer audition, statistics and math applications. It is based on the already excellent AForge.NET Framework. Please see the starting guide for mode details. The latest version of the framework includes the latest version of this code plus many other statistics and machine learning tools.


Hidden Markov Models

An introduction about Hidden Markov Models (HMM) was presented in a previous article, entitled Hidden Markov Models in C#. HMMs are models are stochastic methods to model temporal and sequence data.

They allow, among other things, (1) to infer the most likely sequence of states that produced a given output sequence, to (2) infer which will be the most likely next state (and thus predicting the next output) and (3) calculate the probability that a given sequence of outputs originated from the system (allowing the use of hidden Markov models for sequence classification). In the context of this article, we will be more interested in results from ability (3).

Summary for Hidden Markov Models


Hidden Markov Models can be seem as finite state machines where for each sequence unit observation there is a state transition and, for each state, there is a output symbol emission. The picture on the left summarizes the overall definition of a HMM given in the previous article.



Hidden Markov Model -Based Sequence Classifier

Hidden Markov Models can be trained individually on data obtained from each class of a classification problem. If we create a set of models and specialize each model to recognize each of the separated classes, then we will be able to explore the HMMs ability to calculate the likelihood that a given sequence belongs to itself to perform classification of discrete sequences.

After all models have been trained, the probability of the unknown-class sequence can be computed for each model. As each model specialized in a given class, the one which outputs the highest probability can be used to determine the most likely class for the new sequence, as shown in the picture below.

Evaluation structure for a classifier based on Hidden Markov Models

Schematic representation of the sequence classification procedure.


Source Code

The implementation is very straightforward, as shown in the picture below. The Tag property of the HiddenMarkovModel class can be used to store additional information about which class the model will be representing.

Class diagram for HMM-based sequence classifierClass diagram for the HMM-based sequence classifier.

Computing the most likely class for a given sequence

Computing the most likely class for a given sequence is as simple as iterating through all models in the set and selecting the one which has produced the highest likelihood value.


Training each model to recognize each of the output classes

To train each model to recognize each of the output classes we have to split the training data into subsets whose outputs corresponds to each of the classes from the classification problem.


Using the Code

Code usage is very simple. Once one has the input data available in the form of a sequence array and output data in form of a integer array, create a new HiddenMarkovClassifier object with the appropriate parameters.

Then, just call Learn() passing the input and output data and the convergence limit for the learning algorithm. After that, call Compute() passing a new integer sequence to be classified by the system.


Sample Application

The accompanying sample application can read Excel spreadsheets containing integer sequences and labels for those sequences. An example spreadsheet contained inside the Resources folder in the compressed archive file demonstrates the expected format for this data.

Hidden Markov Model - Accord.NET Hidden Markov Model - Accord.NET

Left: Input sequences data. Right: Hidden Markov Models contained in the Sequence Classifier with initial configurations.

Hidden Markov Model - Accord.NET Hidden Markov Model - Accord.NET

Left: Trained models. Right: Performance test for the Hidden Markov Model Sequence Classifier.


See also




  1. The classifier is very useful. Could you please implement “Load” and “Save” methods for HiddenMarkovClassifier class? Sometimes you need to save your train model.

  2. Hi, thanks for the positive feedback. I have used Word 2010 to draw the figures.

    About saving and loading the model, you can use .NET serialization for that. Just mark the class as Serializable and then use a BinaryFormatter to serialize the class to a stream.


  3. It would be nice if the HMM engine could be extended to cover the more general case with non-discrete observations (e.g. gaussians) in order to avoid the quantization error. Also multivariate inputs would be useful too (e.g. when you have inputs from multiple sensors, not just one).

  4. Hi Antonino,

    Excellent suggestion! It is so nice that actually I was already working on that 🙂 Most of the code for Continuous density Hidden Markov Models is already done and probably will be available in the next version of the framework. I will first publish a post about K-Means (probably tomorrow), then one about Gaussian Mixture Models and finally one about CHMMs using (multivariate) Mixture Models. If you wish I can send you the implementation as it is for testing.


  5. I am also interested in the non-discrete version.

    How to deal with a scaled sequence? This classifier can recognize the sequence of symbols 0-1-1-2-1, but a similar scaled sequence of 0-2-2-5-2 (except for the 5), should also be classified as the same sequence. How to approach this?

  6. @FreshCode:

    in the discrete version the symbols are to be interpreted as taken from an alphabet (“A”, “B”, “C”,…) so there is no concept of “scale” or any other relation between symbols.

    But I guess you can treat your inputs so to feed “normalized” sequences (e.g 0-mean, 1-variance) or something like that.

  7. Thanks for your great work, however i have some confusing points here:

    1. As the model you drawn for the HMM model, i think it should have the relationship with all the states not only from RIGHT to LEFT as you depicted. Ex: from state 1 can go to state 3 and 4 also, in your illustration, state 1 can only go to state 1 and state 2 only.
    2. In your C# program, is the label A, B, C mean that class A, class B and class C?
    3. In your sample Application picture 2 and 3:
    Is the pic 2 the initial number for the model? and you got it by divided 1 by the number of states of number of symbols in a state to get the probabilities?
    4. I supposed this is also your post http://www.codeproject.com/Articles/69647/Hidden-Markov-Models-in-Csharp.aspx

    In this post in the USING THE CODE part, with this code:

    int[][] sequences = new int[][]
    new int[] { 0,1,1,1,1,1,1 },
    new int[] { 0,1,1,1 },
    new int[] { 0,1,1,1,1 },
    new int[] { 0,1, },
    new int[] { 0,1,1 },

    you said that: “For us, it can be obvious to see that the system is outputting sequences that always start with a zero and have one or more zeros at the end.”
    If i understand correctly, you mean that those sequences should be (by adding some 0 at the end):
    int[][] sequences = new int[][]
    new int[] { 0,1,1,1,1,1,1 },
    new int[] { 0,1,1,1,0,0,0 },
    new int[] { 0,1,1,1,1,0,0 },
    new int[] { 0,1,0,0,0,0,0 },
    new int[] { 0,1,1,0,0,0,0 },

    But i do not think that adding 0 in here is correct.

    I have above those confusing points. Please let me know about that.
    I really applicate.

    Best regard,
    Dinh Quoc Hung

  8. Hi Dinh,

    Sorry, for some reason the comment had been filtered as spam and I couldn’t reply it here. I really apologize for that.

    Regarding your questions:

    1) You are talking about this one in the introduction, right?

    Then in this case it really depends on how you initialize your transmission, emission and initial state probabilities. Any zero in those matrices will effectively “cut” one of the connections from the model. Usually the decision to cut connections from a model can be seen as making assumptions about the underlying system.

    2) Yes, each row corresponds to a HMM. Each HMM has an associated class.

    3) Yes, the picture 2 contains the initial HMM models, prior to any learning. They have been initialized with equal probabilities.

    4) It is a typo. It should really have read ” and have one or more **ones** at the end”. The post in CodeProject is just a repost from the original post Hidden Markov Models in C# which I published previously here.

    Hope I have answerer all your questions! If still have any. please ask.

    Best regards,

  9. Fantastic Article! I’m eager to see the Gaussian version – and if it could handle multiple mixtures that would be great too!

    The world needs a *quality* and open HMM library – so many exist but they’re poor implementations – some commercial systems are great but closed source!

    I’d love to help! 🙂

  10. Hi Peter,

    Thanks for the positive feedback!

    Continuous-density Hidden Markov Models are already available in the Accord.NET Framework, albeit some significant reworking of the HMM package is scheduled for the next release. It is already possible to use any continuous distribution as the underlying emission probabilities, including mixtures of Gaussians. The new architecture aims to provide extensibility and allow the implementation of other training/learning algorithm beyond the Baum-Welch algorithm.

    Well, since you mentioned you would like to help, would you mind taking a look on the new design I am proposing for the next release? 🙂 It is mostly finished, but I am looking for suggestions. Please send me an email if you wish to!

    Best regards,

  11. Hi César,

    My compliments on your work!
    I have a question. I am doing my final thesis. My project is to do gesture recognition with the microsoft kinect. Therefore i use the openni framework which automatically tracks my hand for free and provides me with x, y, z coordinates of my hand in realtime.

    Now i want to recognize figures like making an ” L ” movement.

    It is possible to learn the model passing the x,y,z coordinates directly?

    I really don’t know were to start.

    Can you give me some pointers?

    Thank you in advance.



  12. Hi César,

    I don’t know what went wrong but my post was deleted.

    I will shorten my question till this:

    I have the input data available but it is a continuous stream of xyz. Can i pass this to the learn() function?

  13. Hi Michael!

    Sorry, it seems Google had filtered your first comment as spam. I really apologize for that.

    About your question: Yes, it is very possible. In fact, this was the exact reason I implemented Markov models in the Accord.NET Framework. The Accord.NET Framework also implements the continuous-density version of the Markov model presented on this page, which can accept and learn from spatial coordinates directly. It also features a sample application for mouse gesture recognition which can be perfectly adapted for hand gesture recognition as you mentioned.

    If you wish, please contact me by email (cesarsouza at gmail dot com). I am also researching gesture recognition, and I would love to help (and to know how it would work with the Kinect).

    Best regards,

  14. Hi,

    the sample application for mouse gesture recognition zip file was broken. I just can’t unzip the file. Could you check that? Thanks!

    Best regards,

  15. Hi Anonymous,

    This is a known problem on Origo, it happens from time to time. The file per se is not broken, but it may have been corrupted during download. Please try re-downloading it later.

    Best regards,

  16. Hi, Cesar,

    Sorry for my poor english.

    I got it.
    I found that after first unzip it, the file name is “accord-mouse-gestures-demo”. It does not have extension. So I unzip again.

    But, I found that it’s just project demo. So, it have no source code, does it right?

    Can you release source code?
    Thanks you!

    By the way, did you study Conditional Random Field (CRF)?

    Best regards,

  17. Hi Roni,

    Well, this is weird. I think your download got corrupted. Try opening it with WinRAR (it opens correctly here). The accord-hmm-demo.zip should contain the executables and binaries of the sample application. The accord-hmm-source contains source code. Both links are available in the top of the article.

    Yes, I have also studied Conditional Random Fields for a while. I have also made a simple implementation for linear-chain fields which will be available soon.

    Best regards,

  18. Sir, Fantastic article regarding HMM.
    Could you please tell me, how can i use this technique in creditcard fraud detection

  19. Hi César,

    Thank you for your great work.

    I have 2 questions:
    1.I could not find the Continuous-density Hidden Markov Models in Accord.NET Framework. Can you give me advice on this.
    2. In the HMM sample application in this article, there are 3 different HMM models and all those 3 models used same number of symbols (it seems 3) in observation sequence. So in reality it can be different? For example, when number of output symbols in observation sequence of first HMM is 3, number of output symbols in observation sequence of next HMM is 2. Or the variable M (number of symbols on observation sequence) belongs to all HMM?

    Sorry if the way to express my question was not clear.


  20. Hi Su,

    The Accord.NET Framework presents hidden Markov models as either discrete or generic. The discrete version can be obtained by creating a HiddenMarkovModel, and a generic one with any arbitrary distribution can be obtained by creating a HiddenMarkovModel. The IDistribution could be a NormalDistribution, for example. For more details, please see the example on the documentation. I hope it will make things clearer.

    Now about your second question. Indeed, the number of symbols can be set individually for each HMM. However, there is little point on doing that, because you will need to query all HMMs on a given sequence to determine which gives the higher probability. So if an HMM is expecting at most 2 symbols and you give a sequence with 3, it will produce an error. You could present a sequence with lesser symbols, though.

    Best regards,

  21. Thank you.

    I’m eager to see your article about Continuous density Hidden Markov Models. Hopefully that it would be very useful for people who are fresh same as me.

    Thanks and regards,

  22. By the way my question a little bit different. Sorry for my question was not easy to make sense.

    As demonstrated in the article there are class A, B and class C. Their observation sequences used as training data and the number of symbol in Observation sequences of each class was 3 (M=3). If class A and C used 3 symbols in their Observation sequences(for Class A M=3; for class C M=3). Is it ok that class B can have 2 symbols in its observation sequence (for class B M=2)?

    Or can i assume that each class used for training should have same number of symbols in their observation sequence?
    (for Class A M=3;
    for class B M=2;
    for Class C M=3)
    I mean: here all those A,B and C classes used for training, not testing.

    Thanks again

  23. Hi there,

    You must assume all classes have the same number of symbols. Actually, the actual sequences could have less symbols, but at the moment of creation you must specify the maximum number of symbols among all possible classes.

    So if in your sequences you have Class A with 3 symbols, B with 2 and C with 3, you should create all models with 3 symbols. This will ensure all models are able to interpret the three possible symbols, even if for class B there will never be a third symbol.

    Best regards,

  24. Hi Cesar,

    Brilliant job on the library that you’ve created, it’s absolutely amazing.
    I noticed in the comments that you were interested in gesture recognition using Kinect. The comment has been made more than a year ago. I was wondering if you had any luck with Kinect and specifically Kinect for Windows SDK (Version 1.5) can you please let me know about it?

  25. Hi Cesar,

    Can we use this model to predict missing values in the large data set? I’d like to back-fill missing data for financial application based on the sequence of data.


  26. Hi César,

    Thank you for your valuable library.

    I’m looking for way/tool that can convert my bivariate data into univariate with minimal loss of information, since discrete HMM needs univaraiate data. I’m using PCA as feature extraction method and having bivaraite data after PCA.

    I guess that some distance based method can help me to convert the bivariate data into univariate. But not sure which method i can use. Is there anything in your library to help me to convert bivariate/multivariate data into univariate?

    Thank you in advance.

  27. Hi César.

    Thank you for this great tutorial. I have a problem to classify account documents(such as invoices, orders, etc). (This documents has a strong structure). I want to use HMM sequence classifier and I have some questions:
    0. Does HHM is suitable model for this?
    1. What alphabet (M) should I use? Should it will be all letters?
    2. How many states should it be?

    Thanks and regards,

  28. Hi Artem!

    For document classification, I would suggest using a Bag-of-Words model and a standard classifier, such as Support Vector Machines. I am going to add a few examples on how to create bag-of-words soon; but there are some examples available in the unit tests at


    Basically, you can pass all documents as inputs for a BagOfWords constructor. Afterwards, you can retrieve fixed length representations for each document using its GetFeatureVector method. Since the feature vectors will have fixed length, you can pass them along to any other classifier, such as the SVMs I mentioned above.

    Hope it helps!

    Best regards,

  29. Hi Cesar, excellent work!!!
    I have 2 question for you:
    1) My aim is to classify simple action starting from sequence of postures, I have only 3 postures available: SITTING, STANDING, LYING DOWN….and 3 action to recognize: GOING TO BED, SLEEP, WAKE UP.
    Do you think that I can solve my problem with this methodology?

    2) In your example, I don’t understand these rows:
    0-0-0-2-2-2-1-2-2-2-2-2-2-2 A 2
    0-1-1-1-1-2-2-2-2-2-2-2-1-2 B 2

    you have sequences with 3 states but in the column STATES you report the number 2….why?

    Thanks in advance

  30. Hi Andrea,

    Thanks, I hope the tutorial can be useful to you! Please note that the values in the sequences are not the states, but actually the observed values. The sequences represent sequences of observations. As the observations are fed into the Markov models, its inner state may change. For example, given the observations 0, 0, 0, the Markov model may stay at its first hidden state. However, when there is a transition from 0 to 2, it may change to another state. As you can see, the number of different possible observation symbols and the number of states are independent.

    In your case, the postures available would be your observation symbols, and the actions would be class labels. The number of hidden states may be guessed, or you can try some values and see which work best.

    Hope it helps!

    Best regards,

  31. Ok, now I understand, thank you for the support!
    I try with your code and later my purpose will be to develop a methodology for action recognition using unclustering techniques, because in other context the number of actions could be not defined a priori (automatic human behaviour understanding)

  32. Hi Cesar, in your code I find:

    // Train the ensemble
    for (int i = 0; i < 10; i++)
    hmmc.Learn(sequences, labels, iterations, limit);

    It’is correct to replace the number 10 with N (where N is the number of training rows?)


  33. Hi there,

    Well, no. The hidden Markov learning algorithms can’t be called sequentially like in the referred code. Please, can you let me know where did you find this excerpt, so I can take a look and see if it is not a bug?

    Best regards,

  34. Hi César,
    I’m trying to use your application with the same example and I’m not getting the same results as posted in the images. I’m getting negative probabilities in the matrices- Do you know what is causing this?

  35. Hi there,

    It is because the matrices now contain log-probabilities instead of probabilities. They are kept on this form to avoid numerical innacuracies. In order to retrieve the probability values, just apply a Math.Exp function to those values.

    Hope it helps!

    Best regards,

  36. Hi César,
    I have another question. When we try to classify a sequence, we compute the likelihood of each HMM model, and the model that have the highest score is one classified. To calculate the likelihood, we need to use the foward algorithm, and the final step of this algorithm gives us the likelihood.This corresponds to the sum of the last score obtained of each state. I try to make the calculations (excel) in a small sequence but I am getting much smaller numbers compared with the likelihood obtained with your program. Do you know what is causing this?

    1. Hi Cesar,

      I have a quick question to you. I am a novice programmer but I have got fairly good understanding about HMM and the reason for choice of this model in my application.

      My question is straight forward. I have my input temporal data in the form of CSV or SQL files, and therefore how can I use your existing code to train or estimate parameters of the model?

      Each of these temporal data is per second granular timestamped data from multiple sensors and for few days. And I have 4 or 5 such files of individual components and 1 file which forms the aggregate of all the sensor values at the same instance of time. My objective is to decipher the aggregate value into 4 or 5 individual components using HMM.

      Your advice to tweak and use your model as base would be deeply appreciated.

      Thanks in anticipation


    2. Hi Balaji!

      If you have your data available in CSV files, you can use an CsvReader (such as the one provided by the LumenWorks CSV Reader’s package) to read the CSV columns into a double[][] array. You should load your array more or less like this:

      double[][] data = new double[lines_in_file][]
      for (int i = 0; i < lines_in_file; i++)
      double[] sensor_data = // … read data from the current line
      data[i] = sensor_data;

      Afterwards, you will have one big array where each element contains the sensors observations for a particular second. Then, you can create an Independent-Gaussian HMM with 5 states, for example, and use the Baum-Welch learning algorithm to teach this HMM.

      I have posted an example on how to do this on this gist. Please let me know if it helps!


      Best regards,

    3. Hi Balaji,

      I am sorry, it seems the comments section of this website is not working properly. If you wish, you can contact me directly at my email at cesarsouza at gmail dot com.

      Best regards,

  37. Hi Balaji!

    If you have your data available in CSV files, you can use an CsvReader (such as the one provided by the LumenWorks CSV Reader’s package) to read the CSV columns into a double[][] array. You should load your array more or less like this:

    double[][] data = new double[lines_in_file][];

    for (int i = 0; i < lines_in_file; i++)
    double[] sensor_data =3D // … read data from the current line
    data[i] =3D sensor_data;

    Afterwards, you will have one big array where each element contains the sensors observations for a particular second. Then, you can create an Independent-Gaussian HMM with 5 states, for example, and use the Baum-Welch learning algorithm to teach this HMM. I have posted an example on how to do this on this gist. Please let me know if it helps!


    Best regards,

  38. Hi Cesar, thank you very much for your efforts
    when i used your code for long series of numbers, it gives me value that point at there is no similarity
    this also happens when using the same sequence as for the trained data(trained data = sequence)
    help me plz

  39. Hi Elsayed, I hope the work has been useful to you!

    So when you say they have no similarity, what exactly do you mean? The point is that the HMM doesn’t produce a value between 0 and 1 where 1 is the most similar and 0 not similar. Instead, it produces a value than can be used to compare, among several models, to which model a sequence would be closer to.

    For example, let’s say you have two HMM models, one that was learned using sequences of the type A, and another model that was learned using sequences of the type B. If you have a sequence that resembles the sequences from type A, and ask the models to give the probability of this sequence, the model that was trained on sequences of type A will be expected to give a numeric answer that is greater than the model that was trained using sequences of type B.

    However, note that both model A and B are allowed to give extremely tiny answers. What would matter is how they compare against each other. For example, HMM A can give 0.00001, but HMM B gives 0.000000000001. This is an indication that model A recognizes the sequence better than model B.

  40. Hi Cesar, Thank you so much for your work on Accord.NET
    I have a question on sequence classification, as I’ve tried to apply HMM to my own sequence, but it doesn’t seem to work the way I expected it to work. Basically, if I train HMM with Baum-Welch over some particular sequence of 18 elements, and then I feed exactly the same sequence back to it to evaluate, I get a very low probability. Something in the range of 0.000001. How that could be? I’ve tried to vary the number of states, but keep getting the same number. However, if I use just 3 numbers in the middle, I get a probability of 0.3, which then rapidly decreasing the more numbers I add. I expected, the more numbers go in the same sequence, the higher probability should be, which seems to be opposite to what I observe with Accord.NET.

Leave a Reply

Your email address will not be published. Required fields are marked *