Decision Trees in C#

tree_thumb-25255B6-25255D

Decision trees are simple predictive models which map input attributes to a target value using simple conditional rules. Trees are commonly used in problems whose solutions must be readily understandable or explainable by humans, such as in computer-aided diagnostics and credit analysis.

treeIntroduction

Decision Trees give a direct and intuitive way for obtaining the classification of a new instance from a set of simple rules. Because of this characteristic, decision trees find wide use in situations in which the interpretation of the obtained results and of the reasoning process is crucial, such as in computer-aided diagnostics (CAD) and in financial credit analysis. Consumer credit analysis is an interesting example because, in many countries, one can not simply reject credit without giving a justification, justification of which is trivial to extract from a decision tree.

The tree on the right has been generated using the Context Free software based on the grammar shared by davdrn.

Learning decision trees

Decision trees can be simply drawn by hand based on any prior knowledge the author may have. However, their real power becomes apparent when trees are learned automatically, through some learning algorithm.

The most notable and classics examples to decision tree learning are the algorithms ID3 (Quinlan, 1986) and the C4.5 (Quinlan, 1993). Both are examples of greedy algorithms, performing local optimum decisions in the hope of producing a most general tree. Such algorithms are based on the principle of the Occam’s razor, favoring simpler or smaller trees in the hope that such smaller trees could retain more generalization power. This preference is formalized through the specification of an hypothesis ordering criteria such as the information gain. The information gain measures the, as the name implies, gain of information in using each of the attributes as a next factor to consider during decision. The information gain can be defined as:

image003

image004

However, the information gain has a bias towards attributes with a high number of possible values (Mitchell, 1997). A way to overcome this bias is to select new selection attributes based on alternative criteria, such as the gain ratio (Quinlan, 1986), defined as:

image005

image006

In the GainRatio, the SplitInformation term attenuates the importance given to attributes with many, uniformly distributed, possible values.

Iterative Dichotomizer 3 (ID3)

The algorithm presented below is a slightly different version of the original ID3 algorithm as presented by Quinlan. The modifications are to support multiple output labels. In each recursion of the algorithm, the attribute which bests classifiers the set of instances (or examples, or input-output pairs, or data) is selected according to some selection criteria, such as the InfoGain or the GainRatio.

  • ID3(instances, target_attribute, attributes)
    • Create a new root node to the tree.
    • If all instances have the target_attribute belonging to the same class c,
      • Return the tree with single root node with label c.
    • If attributes is empty, then
      • Return the tree with single root node with the most common label of the target_attribute in instances.
    • Else
      • A ← the attribute in attributes which best classifies instances
      • root decision attribute ← A
      • Foreach possible value vi of A,
        • Add a new ramification below root, corresponding to the test A = vi
        • Let instancesvi be the subset of instances with the value vi for A
        • If instancesvi is empty then
          • Below this ramification, add a new leaf node with the most common value of target_attribute in instances.
        • Else below this ramification, add the subtree given by the recursion:
          ID3(instancesvi, target_attribute, attributes – { A })
    • End
  • Return root

Difficulties and disadvantages of decision tree learning

Despite relying on the Occam’s Razor to guide the learning, neither ID3 or C4.5 algorithms are not guaranteed to produce the smaller or more general tree possible. This happens because their learning is based solely on heuristics and not in truly optimality criteria. The following example, from (Esmeir & Markovitch, 2007) illustrates the learning of the concept xor (exclusive or) by the ID3 algorithm. In this example, A3 and A4 are irrelevant attributes, having absolutely no influence on the target answer. However, yet being irrelevant, ID3 will select one of them to belong to the tree. In fact, ID3 will select the irrelevant attribute A4 to be the root of the learned tree.

A1 A2 A3 A4 label
1 0 0 1 +
0 1 0 0 +
0 0 0 0
1 1 0 0
0 1 1 1 +
0 0 1 1
1 0 1 1 +
image

One complication of decision tree learning is that the problem to find the smaller consistent tree (in the sense of classifying correctly all training samples) is known to be NP-complete (Hyafil & Rivest, 1976). Moreover, the separating decision surface generated by such trees are always formed by parallel cuts alongside the attribute space axis, which could be a severely suboptimal solution (Bishop, 2007, p. 666). The example given by Bishop illustrates well this problem: for example, to separate classes whose optimum decision margin are on 45 degrees from one of the axis, it will be needed a high number of parallel cuts in comparison with a single cut on the diagonal such as could be given by any linear decision machine. Another disadvantage of traditional decision tree learning algorithms is that most methods require only a constant learning time, and, as such, do not allow for trading extra training time for a better solutions. The work of (Esmeir & Markovitch, 2007) is dedicated to overcome such problem.

The following picture shows an example on how learning by decision trees is limited to cuts parallel to the axis, as described by Bishop. The Ying-Yang classification problem is a classical example of a non-linearly separable decision problem. Decision trees, albeit not being linear classifiers, have difficulty classifying this set with simple thresholds.

data-closerdecision-closertree

Top-Left: Ying-Yang non-linear classification problem. Top-Right: Decision surface extracted by a decision tree. Bottom: Decision threshold rules extracted from the tree.

However, despite all of those shortcomings, decision trees plays major roles as base of many successful algorithms. One interesting application and of notorious success is given in the field of object detection in images. The first real-time face detecting algorithm (Viola & Jones, 2001) is based on a degenerated decision tree (also known as a cascade). The body recognition and segmentation algorithm used by the Xbox 360 gaming platform used to process depth images generated by its companion sensor Kinect is equally based on the use of decision trees (Shotton, et al., 2011). As well is the case of the FAST algorithm for interest point detection in images (Rosten & Drummond, 2006).

I should also make the note that both the Viola-Jones and the FAST algorithms are available in the Accord.NET Framework for immediate use (the Viola-Jones (HaarObjectDetector) has also been recently updated to support multiple threads, so feel free to take a look and experiment with it!).

In sum, its possible to say that great part of the applicability of decision trees came from the simple fact that they are extremely fast to evaluate. One of the reasons for this feature is being easily translated to sets of conditional instructions in a conventional computer program. The decision trees now available in the Accord.NET Framework make full use of this fact and enables the user to actually compile the decision trees to native code on-the-fly, augmenting even more its performance during classification.

Source code

The code presented in this section is actually part of the Accord.NET Framework. The Accord.NET Framework is a framework extension to the already very popular AForge.NET Framework, adding and providing new algorithms and techniques for machine learning, computer vision and even more.

The Accord.MachineLearning.DecisionTree namespace is comprised of the following classes:

  • DecisionTree, the main class representing a decision tree, with methods such as Compute to compute the tree classification given an input vector;
  • DecisionNode, the node class for the decision tree, which may or may not have children nodes contained under a collection of children represented by a DecisionBranchNodeCollection;
  • DecisionVariable, a class to specify the nature of each variable processable by the tree, such as if the variable is continuous, discrete, which are their expected or valid ranges;
  • DecisionBranchNodeCollection, a class to contain children nodes together with information about which attribute of the data should be compared with the child nodes during reasoning.

 

Whose relationships can be seen in the following class diagram:

class diagram

The learning algorithms available are either the ID3 algorithm discussed above, or its improved version C4.5 (which can handle continuous variables, but at this time does not yet support pruning), both proposed and published by Ross Quinlan.

learning

Despite the bit complicated look, usage is rather simple as it will be shown in the next section.

Using the code

Consider, for example, the famous Play Tennis example by Tom Mitchell (1998):

In the aforementioned example, one would like to infer if a person would play tennis or not based solely on four input variables. Those variables are all categorical, meaning that there is no order between the possible values for the variable (i.e. there is no order relationship between Sunny and Rain, one is not bigger nor smaller than the other, but are just distinct). Moreover, the rows, or instances presented above represent days on which the behavior of the person has been registered and annotated, pretty much building our set of observation instances for learning.

In order to try to learn a decision tree, we will first convert this problem to a more simpler representation. Since all variables are categories, it does not matter if they are represented as strings, or numbers, since both are just symbols for the event they represent. Since numbers are more easily representable than text string, we will convert the problem to use a discrete alphabet through the use of a codebook.

A codebook effectively transforms any distinct possible value for a variable into an integer symbol. For example, “Sunny” could as well be represented by the integer label 0, “Overcast” by “1”, Rain by “2”, and the same goes by for the other variables. So:

Now we should specify our decision tree. We will be trying to build a tree to predict the last column, entitled “PlayTennis”. For this, we will be using the “Outlook”, “Temperature”, “Humidity” and “Wind” as predictors (variables which will we will use for our decision). Since those are categorical, we must specify, at the moment of creation of our tree, the characteristics of each of those variables. So:

Let’s now proceed and create our DecisionTree:

Now we have created our decision tree. Unfortunately, it is not really very useful, since we haven’t taught it the problem we are trying to predict. So now we must instantiate a learning algorithm to make it useful. For this task, in which we have only categorical variables, the simplest choice is to use the ID3 algorithm by Quinlan. Let’s do it:

At this point, the tree has been created. In order to ask the tree to classify new samples, we can use:

Please note that, in case any of the steps in this post doesn’t work out, it might be because the most up-to-date version in the Accord.NET Framework may have undergone some changes. In this case, please refer to the official documentation at the Accord.NET website.

Going further, a suitable representation of the learned tree could be given by the following diagram:

image

However, more than just a diagram, we can also go and generate a .NET Expression Tree describing our decision tree. Expression trees represent code in the form of a tree of expressions, which can then be read, modified or compiled. By compiling the DecisionTree‘s expression, we are able to generate code on-the-fly and let the JIT compile it down to native code at runtime, greatly improving the performance of our decision tree:

And here is the resulting C# code obtained by compiling the expression into a lambda function, dumping the function into an dynamic assembly, opening and inspecting this assembly using ILSpy:

Conclusion

Decision trees are useful tools when the problem to be solved needs to be quickly interpreted and understood by humans. Another suitable use is when the decision needs to be fast. However, decision trees, at least those trained by simple training algorithms such as ID3 and C4.5 can perform quite poorly depending on the problem. As it happens with all machine learning techniques, it is futile to believe there is a one true classifier which would act perfectly on all possible imaginable scenarios. As always, it is important to know our tools and know in which situation each technique would work better.

References

  • Bishop, C. M., 2007. Pattern Recognition and Machine Learning (Information Science and Statistics). 1st ed. 2006. Corr. 2nd printing ed. s.l.:Springer
  • Fayyad, U. M. & Irani, K. B., 1992. On the Handling of Continuous-Valued Attributes in Decision Tree Generation. Machine Learning, January, 8(1), pp. 87-102.
  • Quinlan, J. R., 1986. Induction of decision trees. Machine Learning, 1(1), pp. 81-106.
  • Quinlan, J. R., 1993. C4.5: Programs for Machine Learning (Morgan Kaufmann Series in Machine Learning). 1 ed. s.l.:Morgan Kaufmann.
  • Shotton, J. et al., 2011. Real-Time Human Pose Recognition in Parts from Single Depth Images. s.l., s.n.
  • Viola, P. & Jones, M., 2001. Robust Real-time Object Detection. s.l., s.n.
  • Mitchell, T. M., 1997. Decision Tree Learning. In:: Machine Learning (McGraw-Hill Series in Computer Science). s.l.:McGraw Hill.
  • Mitchell, T. M., 1997. Machine Learning (McGraw-Hill Series in Computer Science). Boston(MA): WCB/McGraw-Hill.
  • Esmeir, S. & Markovitch, S., 2007. Anytime Learning of Decision Trees. J. Mach. Learn. Res., May, Volume 8, pp. 891-933.
  • Hyafil, L. & Rivest, R. L., 1976. Constructing Optimal Binary Decision Trees is NP-complete. Information Processing Letters, 5(1), pp. 15-17.

54 Comments

  1. Teu trabalho é ‘fantástico’, parabéns!
    Preciso de tutorial sobre como usar SOM network para fazer cluster de uma série temporal de 15 dimensões. O exemplo como as cores não me ajudou muito. Tens como me ajudar?
    Felipe

  2. Just get

    var exp = tree.ToExpression();

    ‘System.InvalidOperationException’ occurred in mscorlib.dll
    Nullable object must have a value.

    while using some large data which includes total posibilities

    bug? or am I miss something?

  3. I think there’s a bug around

    DecisionTree.cs line 112

    if (current.IsLeaf)
    {
    // This is a leaf node. The decision
    // proccess thus should stop here.
    if (current.Output == null) return -1; //add this line temporarily solve my problem.
    return current.Output.Value;
    }

    which occurs while the output is null
    I use some data set from here.
    http://archive.ics.uci.edu/ml/datasets/Nursery

    That is the only situations I’ve cared about.
    Thanks for you to replying me so fast. (That’s out of my expectation)

  4. Hello, it worked pretty well. I have just one question though. Did you create by hand the diagram or did you use any component to generate it?, thanks

  5. What I want to do is to show the tree readable for humans, instead of showing just number with the Codification class. What do you recommend?

  6. Hmm… The codification class can also perform the inverse, retrieve the variable name or value given a number. On the sample application available on the top of the post I had shown the tree using the DecisionTreeView control available on Accord.NET. But it is a simple control based on the .NET tree view, and may not be very readable by humans.

    Perhaps you could use Nevron’s diagrams for .NET, but I suspect they are paid components… If you know of any charting or diagram library for .NET it should not be too difficult to create a view control using the DecisionTree and Codification structures.

    If you know of a free library for creating diagrams in .NET, please let me know.

    Best regards,
    Cesar

  7. Something is not working in the tennis example:

    First, the adding of the column is not possible like that in .NET 4.0.
    Instead of doing:

    data.Columns.Add(“Day”, “Outlook”, “Temperature”, “Humidity”, “Wind”, “PlayTennis”);

    You need to do:

    data.Columns.Add(“Day”);
    data.Columns.Add(“Outlook”);
    data.Columns.Add(“Temperature”);
    data.Columns.Add(“Humidity”);
    data.Columns.Add(“Wind”);
    data.Columns.Add(“PlayTennis”);

    And at the end when you do:

    int[][] inputs = symbols.ToIntArray(“Outlook”, “Temperature”, “Humidity”, “Wind”);
    int[] outputs = symbols.ToIntArray(“PlayTennis”).GetColumn(0);

    I don’t know what version of .NET you are using but my symbols object (DataTable) doesn’t have a ToIntArray function and I don’t know what to do instead…

    Any suggestion César?

    Thanks!

    Jean-François

  8. Hi Jean,

    Please include the following using directive on top of your .cs file, and it should do the trick for both cases:

    using Accord.Math;

    Hope it helps!

    Best regards,
    Cesar

    1. I’m including that directive on top of my .cs file and including a reference to Accord.Math.dll, but I am having the same problem:

      DataTable does not have a .ToIntArray method and I am only able to add 1 column at a time, like Jean-Francois is.

  9. Hi Cesar,

    Would you know where I can find an example for building an Accord tree with multiple outputs? You mentioned that the ID3 algorithm has been modified to support this, but it seems we can only have one output vector.

    Any help would be greatly appreciated.

    Thanks
    Mustapha

  10. Hi Cesar,

    Would you know where I can find an example for building an Accord tree with multiple outputs? You mentioned that the ID3 algorithm has been modified to support this, but it seems we can only have one output vector.

    Any help would be greatly appreciated.

    Thanks
    Mustapha

  11. Hi Mustapha,

    For multiple outputs, did you mean your outputs are actually vectors, such as a single instance can have more than a single label? In this case, I guess it would be possible to create one tree for each of the possible outputs in the output vector.

    PS: Sorry for not noticing your comment earlier. I am not sure why I didn’t get a notification 🙁

    Best regards,
    Cesar

  12. Hi César,
    Any chance on adding Random Forests to the already great Accord.NET? Also Pruning is misspelled in the DecisionTrees namespace, I realize English may not be your first language so I thought I’d try and point that out. I’m not trying to be critical, just helpful.

  13. Oh! My bad, not sure how that went unnoticed. Thanks for pointing it out!

    Regarding random forests, I suppose it would be a nice addition. I will check out. In the meantime, perhaps you could register an issue report at the project’s issue tracker as an enhancement request.

    Regards,
    Cesar

  14. Hello.
    I use your code for generate a tree using c45.
    After it’s generated it’s possible to get a list of results? pick on you result diagram.. it’s possible to retrieve a List of true values?

  15. Hi, I am getting error “Cannot apply indexing with [] to an expression of type AForge.DoubleRange” at Matrix.Interval(ranges[0], 0.05), and Matrix.Interval(ranges[1], 0.05)

    // Draw the separating surface
    var ranges = Matrix.Range(sourceMatrix);
    double[][] map = Matrix.CartesianProduct(
    Matrix.Interval(ranges[0], 0.05),
    Matrix.Interval(ranges[1], 0.05));
    Not getting whats the problem? Need Help

  16. Hi Cesar,

    In the Accord Framework, is there a way to compute a single input into the tree one step at a time? For example, in the Play Tennis example, can the system ask the user “is it sunny?” and then using this binary answer, follow the appropriate branch?

    Thanks a lot

  17. when i run this application ..nd clicks on the create tree .. the message box with message comes that “load some data”,, from where to load the data i m not getting??

  18. @Gunjan Rani, pls enter your data through the left top of the UI i.e File-> Open. There is an excel sheet ..gets opened. double click on the file needed. u can also enter ur own data in the excel sheet but make sure that the code behind the UI is proper i.e make sure that input[] and output[] are correct

  19. So, if the input data is are numbers (e.g. accelerometer data), I don’t need to use the codebook? Also how should I set the ranges for the DecisionVariable?

    1. Hi Natan,

      Thanks for the correction. Indeed, it should be int rather than double. However, I didn’t get the exception that you mention. Please, would you like to try it using the latest version of the code available at the project’s page?

      It is available on http://accord-framework.net.

      Hope it helps!

      Best regards,
      Cesar

  20. While running the data with the two following lines :
    data.Rows.Add(“D1”, “Sunny”, “Hot”, “High”, “Strong”, “Yes”);
    data.Rows.Add(“D2”, “Sunny”, “Hot”, “High”, “Strong”, “No”);

    (i.e. similar input different output) I got an exception from
    id3learning.Run(inputs, outputs);

    Regarding the length of the arrays (it seems that it does distinct or something similar)

    Should it work like this?

  21. I got along with the problems I have mentioned beyond.
    Now when I am running c45 I am getting too many times the following exception:
    “The tree is degenerated”
    Anything I can do?
    Natan

  22. Hi Cezar,
    can you help me with the following code.
    After running this code I don’t get the correct decision tree.

    Thanks

    Best Regards
    Attila

    private void btnFormiraj_Click(object sender, EventArgs e)
    {
    DataTable data = new DataTable();
    data.Columns.Add(“Weekend”);
    data.Columns.Add(“Weather”);
    data.Columns.Add(“Parents”);
    data.Columns.Add(“Money”);
    data.Columns.Add(“Decision”);

    data.Rows.Add(“W1″,”Sunny”, “Yes”, “Rich”, “Cinema”);
    data.Rows.Add(“W2”, “Sunny”, “No”, “Rich”, “Tennis”);
    data.Rows.Add(“W3”, “Windy”, “Yes”, “Rich”, “Cinema”);
    data.Rows.Add(“W4”, “Rainy”, “Yes”, “Poor”, “Cinema”);
    data.Rows.Add(“W5”, “Rainy”, “No”, “Rich”, “Stay in”);
    data.Rows.Add(“W6”, “Rainy”, “Yes”, “Poor”, “Cinema”);
    data.Rows.Add(“W7”, “Windy”, “No”, “Poor”, “Cinema”);
    data.Rows.Add(“W8”, “Windy”, “No”, “Rich”, “Shopping”);
    data.Rows.Add(“W9”, “Windy”, “Yes”, “Rich”, “Cinema”);
    data.Rows.Add(“W10”, “Sunny”, “No”, “Rich”, “Tennis”);

    Codification codebook = new Codification(data);

    DecisionVariable[] attributes =
    {
    new DecisionVariable(“Weather”,3),
    new DecisionVariable(“Parents”, 2),
    new DecisionVariable(“Money”,2),
    };
    int classCount = 4;

    DecisionTree tree = new DecisionTree(attributes, classCount);

    C45Learning c45learning = new C45Learning(tree);

    DataTable symbols = codebook.Apply(data);
    double[][] inputs = symbols.ToArray(“Weather”, “Parents”, “Money”);
    int[] outputs = symbols.ToIntArray(“Decision”).GetColumn(0);
    c45learning.Run(inputs, outputs);

    decisionTreeView1.TreeSource = tree;

    }

    1. To clarify, this is for a Continuous Attribute… it appears only the GAIN is used to select the best threshold…

    2. Hi DK,

      Indeed, this is a good question… It has been some time since I implemented those functions, but I remember following most of the literature to code it, as well as our class notes at the time (I was doing a course on machine learning and data mining back then). I don’t know exactly why the gain ratio is not used, but perhaps indeed it could be used in place of the gain. The C4.5 algorithm implementation in Accord.NET has also been recently updated with a little fix on how the thresholds are selected in some situations, and now it is also possible for the algorithm to select many different thresholds for the same continuous variable when creating a decision tree.

      You posted a very interesting question in Stackoverflow, I will be following it. But I would also recommend you to post on crossvalidated – it is like Stackoverflow but for statistics-related problems. Perhaps you could get better answers there!

      Best regards,
      Cesar

  23. Thanks for the great information, but I cannot get this solution to compile, I get the following error: “Error 1 This project references NuGet package(s) that are missing on this computer. Enable NuGet Package Restore to download them. For more information, see http://go.microsoft.com/fwlink/?LinkID=322105. The missing file is (…)accord-machinelearning-decision-treessources\.nugetNuGet.targets. (…)accord-machinelearning-decision-treessourcesDecision Trees.csproj 149 5 Decision Trees”, and I have Nuget Package Restore enabled. I’m new to C#, am I missing something?

  24. Hi Brent,

    I am sorry, there is indeed a problem with the package restore for the framework samples right now. I am working to restore them as soon as possible. For the time being, just remove the packages through NuGet (and possibly manually), then add them again by searching for Accord.MachineLearning in Nuget.

    Hopefully it should work after that!

    Best regards,
    Cesar

  25. Hello Shalin,

    I simply used powerpoint 🙂 Unfortunately I don’t have a tool that could create the diagrams automatically from the trees at this time.

    Best regards,
    Cesar

  26. Hello,

    I tried the above given code but was unsuccessful. It shows an exception called “Keynotfound” in the file twowaydictionary.cs. Any idea?

    1. Hello,
      You got that exception because; Codification.Translate(string [] values) expect values given in original order of columns, I mean to continue you need to iterate all yours rows one by one,

      DataTable getData()
      {
      DataTable data = new DataTable(“Mitchell’s Tennis Example”);

      data.Columns.Add(“Day”);
      data.Columns.Add(“Outlook”);
      data.Columns.Add(“Temperature”);
      data.Columns.Add(“Humidity”);
      data.Columns.Add(“Wind”);
      data.Columns.Add(“PlayTennis”);
      data.Rows.Add(“D1”, “Sunny”, “Hot”, “High”, “Weak”, “No”);
      data.Rows.Add(“D2”, “Sunny”, “Hot”, “High”, “Strong”, “No”);
      data.Rows.Add(“D3”, “Overcast”, “Hot”, “High”, “Weak”, “Yes”);
      data.Rows.Add(“D4”, “Rain”, “Mild”, “High”, “Weak”, “Yes”);
      data.Rows.Add(“D5”, “Rain”, “Cool”, “Normal”, “Weak”, “Yes”);
      data.Rows.Add(“D6”, “Rain”, “Cool”, “Normal”, “Strong”, “No”);
      data.Rows.Add(“D7”, “Overcast”, “Cool”, “Normal”, “Strong”, “Yes”);
      data.Rows.Add(“D8”, “Sunny”, “Mild”, “High”, “Weak”, “No”);
      data.Rows.Add(“D9”, “Sunny”, “Cool”, “Normal”, “Weak”, “Yes”);
      data.Rows.Add(“D10”, “Rain”, “Mild”, “Normal”, “Weak”, “Yes”);
      data.Rows.Add(“D11”, “Sunny”, “Mild”, “Normal”, “Strong”, “Yes”);
      data.Rows.Add(“D12”, “Overcast”, “Mild”, “High”, “Strong”, “Yes”);
      data.Rows.Add(“D13”, “Overcast”, “Hot”, “Normal”, “Weak”, “Yes”);
      data.Rows.Add(“D14”, “Rain”, “Mild”, “High”, “Strong”, “No”);
      return data;
      }

      void MakeDecision()
      {
      DataTable data = new DataTable();
      data = getData();
      Codification codebook = new Codification(data);

      DecisionVariable[] attributes =
      {
      new DecisionVariable(“Outlook”, 3), // 3 possible values (Sunny, overcast, rain)
      new DecisionVariable(“Temperature”, 3), // 3 possible values (Hot, mild, cool)
      new DecisionVariable(“Humidity”, 2), // 2 possible values (High, normal)
      new DecisionVariable(“Wind”, 2) // 2 possible values (Weak, strong)
      };

      int classCount = 2; // 2 possible output values for playing tennis: yes or no
      Accord.MachineLearning.DecisionTrees.DecisionTree tree = new Accord.MachineLearning.DecisionTrees.DecisionTree(attributes, classCount);
      ID3Learning id3learning = new ID3Learning(tree);

      // Translate our training data into integer symbols using our codebook:
      DataTable symbols = codebook.Apply(data); int[][] inputs = symbols.ToIntArray(“Outlook”, “Temperature”, “Humidity”, “Wind”); int[] outputs = symbols.ToIntArray(“PlayTennis”).GetColumn(0);

      // Learn the training instances!
      id3learning.Run(inputs, outputs);
      //Values from one row in column order
      int[] query = codebook.Translate(“D3″,”Sunny”, “Hot”, “High”, “Strong”,”Yes”);

      int output = tree.Compute(query);

      string answer = codebook.Translate(“PlayTennis”, output); // answer will be “Yes”.

      }

  27. Dear Cesar
    How do I retrieve a previously saved model and retrain it with more training data?
    Btw, great blog and very useful!

  28. hi , can you help about this code ?!
    i don’t know how to enter input value to get answer ! 🙁 🙁

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.IO;
    using System.Threading.Tasks;
    using System.Globalization;
    namespace Decision_Tree
    {
    class Program
    {
    static void Main(string[] args)
    {

    TextReader sr = Console.In;

    TextWriter sw = Console.Out;

    Console.WriteLine(“enter INT N = “);
    int N = int.Parse(sr.ReadLine());
    for (int caseNo = 1; caseNo <= N; caseNo++)
    {
    sw.WriteLine("Case #" + caseNo + ":");
    Solve(sr, sw);
    }
    sw.Close();
    }

    private class Node
    {
    public double Weight;
    public string Feature;
    public Node Left, Right;

    public double Eval(HashSet features)
    {
    double p = Weight;
    if (Feature == null)
    return p;
    if (features.Contains(Feature))
    p *= Left.Eval(features);
    else
    p *= Right.Eval(features);
    return p;
    }
    }

    private static void Solve(TextReader sr, TextWriter sw)
    {

    int L = int.Parse(sr.ReadLine());
    var sb = new StringBuilder();
    for (int i = 0; i < L; i++)
    {

    sb.Append(sr.ReadLine());
    sb.Append(' ');

    }
    Console.WriteLine("tree = sb.ToString() ");
    string tree = sb.ToString();
    int pos = 0;

    Node root = Parse(tree, ref pos);
    Console.WriteLine("Enter INT A = ");
    int A = int.Parse(sr.ReadLine());
    for (int i = 0; i < A; i++)
    {
    string[] strings = sr.ReadLine().Split(' ');
    var features = new HashSet();
    for (int j = 2; j < strings.Length; j++)
    {
    features.Add(strings[j]);
    }
    double prob = root.Eval(features);
    sw.WriteLine(prob.ToString("0.0000000", CultureInfo.InvariantCulture));
    }
    }

    private static Node Parse(string tree, ref int pos)
    {
    while (char.IsWhiteSpace(tree[pos]))
    pos++;

    if (tree[pos] != '(')throw new Exception();

    pos++;

    while (char.IsWhiteSpace(tree[pos])) pos++;

    int noStart = pos;

    while (!char.IsWhiteSpace(tree[pos]) && tree[pos] != ')')
    pos++;

    double w = double.Parse(tree.Substring(noStart, pos – noStart), CultureInfo.InvariantCulture);

    while (char.IsWhiteSpace(tree[pos]))
    pos++;

    Node node = new Node
    {
    Weight = w
    }
    ;
    if (tree[pos] == ')')
    {
    pos++;
    return node;
    }
    int featureStart = pos;

    while (char.IsLetter(tree[pos])) pos++;

    node.Feature = tree.Substring(featureStart, pos – featureStart);
    node.Left = Parse(tree, ref pos);
    node.Right = Parse(tree, ref pos);

    while (char.IsWhiteSpace(tree[pos])) pos++;

    if (tree[pos] != ')') throw new Exception();

    pos++;

    return node;
    }
    }
    }

Leave a Reply

Your email address will not be published. Required fields are marked *