Liblinear algorithms in C#

The Accord.NET Framework is not only an image processing and computer vision framework, but also a machine learning framework for .NET. One of its features is to encompass the exact same algorithms that can be found in other libraries, such as LIBLINEAR, but offer them in .NET withing a common interface ready to be incorporated in your application.

 

What is LIBLINEAR?

As its authors put, LIBLINEAR is a library for large linear classification. It is intended to be used to tackle classification and regression problems with millions of instances and features, although it can only produce linear classifiers, i.e. linear support vector machines.

The framework now offers almost all liblinear algorithms in C#, except for one. Those include:

Continue reading →

A Tutorial On Principal Component Analysis with the Accord.NET Framework

Principal Component Analysis (PCA) is a technique for exploratory data analysis with many success applications in several research fields. It is often used in image processing, data analysis, data pre-processing, visualization and is often used as one of the most basic building steps in many complex algorithms.

One of the most popular resources for learning about PCA is the excellent tutorial due to Lindsay I Smith. On her tutorial, Lindsay gives an example application for PCA, presenting and discussing the steps involved in the analysis.

 

Souza, C. R. “A Tutorial on Principal Component Analysis with the Accord.NET Framework“. Department of Computing, Federal University of São Carlos, Technical Report, 2012.

This said, the above technical report aims to show, discuss and otherwise present the reader to the Principal Component Analysis while also reproducing all Lindsay’s example calculations using the Accord.NET Framework. The report comes with complete source C# code listings and also has a companion Visual Studio solution file containing all sample source codes ready to be tinkered inside a debug application.

While the text leaves out the more detailed discussions about the exposed concepts to Lindsay and does not addresses them in full details, it presents some practical examples to reproduce most of the calculations given  by Lindsay on her tutorial using solely Accord.NET.
If you like a practical example on how to perform matrix operations in C#, this tutorial may help getting you started.

Sequence Classifiers in C#: Hidden Conditional Random Fields

After a preliminary article on hidden Markov models, some months ago I had finally posted the article on Hidden Conditional Random Fields (HCRF) on CodeProject. The HCRF is a discriminative model, forming the generative-discriminative pair with the hidden Markov model classifers.

This CodeProject article is a second on a series of articles about sequence classification, the first being about Hidden Markov Models. I’ve used this opportunity to write a little about generative versus discriminative models, and also provide a brief discussion on how Vapnik’s ideas apply to these learning paradigms.

All the code available on those articles are also available within the Accord.NET Framework. Those articles provide good examples on how to use the framework and can be regarded as a practical implementation on how to use those models with the framework.

Complete framework documentation can be found live at the project’s website, as well as in the framework’s GitHub wiki. The framework has now been referred on 30+ publications over the years, and several more are already in the works, by me and users around the world.

Academical publications

Talking about publications, the framework has been used within my own research on Computer Vision. If you need help in understanding the inner workings of the HCRF, a more visual explanation on the HCRF derivation can also be found at the presentation I gave on Iberamia 2012 about Fingerspelling Recognition with Support Vector Machines and Hidden Conditional Random Fields.

An application to a more interesting problem, namely natural words drawn from Sign Languages using a Microsoft Kinect, has also been accepted for publication at the 9th International Conference on Machine Learning and Data Mining, MLDM 2013, and will be available soon. Update: it is available at

As usual, hope you find it interesting!

Point and Call and the Windows Phone Ecosystem

So a few days ago I bought a Windows Phone device. So far, I am impressed with the Windows Phone ecosystem. The nicest thing is that I was finally able to test an app I’ve been eager to try for months: Point and Call.

Point-and-call in everyday life.

 

Now, the news: this app uses the Accord.NET Framework to do its magic 🙂

The app author, Antti Savolainen, was kind enough to share some details about his app. It uses part of the SVMs framelet from Accord.NET to do the digit recognition, mostly based on one of the earlier CodeProject articles I’ve posted in the past. Needless to say, Antti did an awesome job, as the SVM part was surely just a tiny fraction of all the work in preprocessing, adjusting, locating, and doing the right things at the right times that I would never be able to figure out alone. Surely, he and his company, Sadiga, deserves all the credits for this neat app!

If you would like find more interesting uses of the Accord.NET Framework, don’t forget to check the framework’s publication page for details!

 

 

Sequence Classifiers in C#: Hidden Markov Models

sample-5

Few days ago I published a new article on CodeProject, about classifiers based on banks of hidden Markov models to accomplish sequence classification.

While I have written about this subject in the past on this very blog, this time the article deals with Gaussian, continuous-density hidden Markov models rather than discrete ones. Plus, at this time the Accord.NET Framework has evolved much since that 2010 post, and the new article reflects most of the improvements and additions in those last two years.

In the meantime, this article is also serving as a hook to a future article, an article about Hidden Conditional Random Fields (HCRFs). The HCRF models can serve the same purpose as the HMMs but can be generalized to arbitrary graph structures and be trained discriminatively, which could be an advantage on classification tasks.

As always, I hope readers can find it a good read 🙂

Screencast Capture

I’ve recently started to record videos to demonstrate some capabilities of the Accord.NET Framework. Surprisingly, there were only a few, free, open source applications to achieve this goal – and none of them had all the features I needed.

It is, until I decided to roll my own.

 Screencast Capture Lite is a tool for recording the desktop screen and saving it to a video file, preserving quality as much as possible. However, this does not mean it produces gigantic files which take a long time to be uploaded to the web. The application encodes everything using solely H624 in an almost lossless setting.

As a demonstration, please take a look on the Youtube video sample shown below. However, note that Youtube actually reduced the quality of the video, even if you watch it in HD. The local copy produced by Screencast Capture has an even higher quality than what is being shown, while the generated video file occupied less than 2 megabytes on disk.

And by the way what would be a better approach to demonstrate the capabilities of the Accord.NET frameworks other than writing this application using them?

Well, actually this application has been created specifically for two things:

  • to aid in the recording of instructional videos for the Accord.NET Framework, and;
  • to serve itself as a demonstration of the use and capabilities of the Accord.NET Framework.

This means the application is written entirely in C# making extensive use of both aforementioned frameworks. The application is completely open source and free, distributed under the terms of the GPL, and a suitable project page is already being served on GitHub.

Hope you will find it interesting!

Deep Neural Networks and Restricted Boltzmann Machines

diagram

The new version of the Accord.NET brings a nice addition for those working with machine learning and pattern recognition: Deep Neural Networks and Restricted Boltzmann Machines.

Class diagram for Deep Neural Networks in the Accord.Neuro namespace.

Deep neural networks have been listed as a recent breakthrough in signal and image processing applications, such as in speech recognition and visual object detection. However, is not the neural networks which are the new things here; but rather, the learning algorithms. Neural Networks have existed for decades, but previous learning algorithms were unsuitable to learn networks with more than one or two hidden layers.

But why more layers?

The Universal Approximation Theorem (Cybenko 1989; Hornik 1991) states that a standard multi-layer activation neural network with a single hidden layer is already capable of approximating any arbitrary real function with arbitrary precision. Why then create networks with more than one layer?
To reduce complexity. Networks with a single hidden layer may arbitrarily approximate any function, but they may require an exponential number of neurons to do so. We can borrow a more tactile example from the electronics field. Any boolean function can be expressed using only a single layer of AND, OR and NOT gates (or even only NAND gates). However, one would hardly use only this to fully design, let’s say, a computer processor. Rather, specific behaviors would be modeled in logic blocks, and those blocks would then be combined to form more complex blocks until we create a all-compassing block implementing the entire CPU.
The use of several hidden layers is no different. By allowing more layers we allow the network to model more complex behavior with less activation neurons; futhermore the first layers of the network may specialize on detecting more specific structures to help in the later classification. Dimensionality reduction and feature extraction could have been performed directly inside the network on its first layers rather than using specific separate algorithms. 

Do computers dream of electric sheep?

The key insight in learning deep networks was to apply a pre-training algorithm which could be used to tune individual hidden layers separately. Each layer is learned separately without supervision. This means the layers are able to learn features without knowing their corresponding output label. This is known as a pre-training algorithm because, after all layers have been learned unsupervised, a final supervised algorithm is used to fine-tune the network to perform the specific classification task at hand.

As shown in the class diagram on top of this post, Deep Networks are simply cascades of Restricted Boltzmann Machines (RBMs). Each layer of the final network is created by connecting the hidden layers of each RBM as if they were hidden layers of a single activation neural network.

Now, the most interesting part about this approach will given now. It is about one specific detail on how the RBMs are learned, which in turn allows a very interesting use of the final networks. As each layer is a RBM learned using an unsupervised algorithm, they can be seen as standard generative models. And if they are generative, they can be used to reconstruct what they have learned. And by sequentially alternating computation and reconstruction steps initialized with a random observation vector, the networks may produce patterns which have been created using solely they inner knowledge about the concepts it has learned. This may be seen fantastically close to the concept of a dream.

At this point I would also like to invite you to watch the video linked above. And if you like what you see, I also invite you to download the latest version of the Accord.NET Framework and experiment with those newly added features.

The new release also includes k-dimensional trees, also known as kd-trees, which can be use to speed up nearest neighbor lookups in algorithms which need it. They are particularly useful in algorithms such as the mean shift algorithm for data clustering, which has been included as well; and in instance classification algorithms such as the k-nearest neighbors.

Accord.NET and .NET 3.5

net4

Quite some time ago, Accord.NET had been upgraded to .NET 4.0. This upgrade was made to advantage of some of the unique platform features. However, this was a drawback for those users who were still bound to .NET 3.5 and would like to use (or continue using) the framework.

So, some weeks ago I did finish creating a compatibility layer providing .NET3.5 support for the newest version of the framework. While some features had to be left aside, most of the framework is currently functional. A few features, however, had to be removed. Those include the Accord.Audio modules,  support for CancellationTokens on processor-intensive learning algorithms, some parallel processing capabilities and some of the Expression Tree manipulation in some numeric optimization algorithms and in Decision Trees.

With time, some of those could be re-implemented to work correctly under 3.5 as needed. You can leave a request, if you would like. Stay tuned!

Haar Feature Face Detection in C#

cp

A new article has been published in CodeProject! This article details the Viola-Jones face detection algorithm available in the Accord.NET Framework. The article page also provides a standalone package for face detection which can be reused without instantiating the entire framework.

 

The article is also participating in CodeProject’s monthly competition under the categories Best C# Article and Best Overall Article of the month! Feel free to cast your vote if you liked and otherwise found them useful!  
This month we have other great articles participating in the competition: Marcelo de Oliveira‘s Social News excels in the Web category; Roy‘s Inline MSIL in C# also presents a very interesting reading in the C# category. The later may eventually be extremely useful to leverage performance in managed applications. 
For those who don’t know, CodeProject is a amazingly useful site which publishes user-created articles and news. Every month, the best articles among all submissions are selected to win prizes and gain popularity as well!

Always test your hypothesis…

gaussians

… but also always make sure to interpret results correctly! This post presents a quick intro on how to perform statistical hypothesis testing and power analysis with the Accord.NET Framework in C#.

Contents

  1. Hypothesis testing
    1. Statistical hypothesis testing
    2. My test turned insignificant.
      Should I accept the null hypothesis?
    3. Further criticism
  2. The Accord.NET Framework
    1. Available tests
    2. Example problems
  3. Suggestions
  4. References

 

Hypothesis testing

What does hypothesis testing means in statistics (and should also mean everywhere, for that matter)? You may recall from Karl Popper’s theory of falsiability that good theories can rarely be accurately proven, but you may gain a considerable confidence on them by constantly challenging and failing to refute them.

This comes from the fact that it is often easier to falsify something than to prove it. Consider for instance the white-swan/black-swan example: Let’s say a theory states that all swans are white. This is a very strong claim; it does not applies to one or a few particular observations of a swan, but all of them. It would be rather difficult to verify if all of the swans in Earth are indeed white. It is thus almost impossible to prove this theory directly.

However, the catch is that it will take only a single contrary example to refute it. If
we find a single swan that is black, the entire theory should be rejected, so alternate theories could be raised. It should be fairly easy to attempt to prove a theory wrong. If a theory continuously survives those attempts to be proven wrong, it becomes stronger. This does not necessarily means it is correct, only that it is very unlikely to be wrong.

This is pretty much how the science method works; it also provides a solution to the demarcation problem originally proposed by Kant: the problem of separating the sciences from the pseudo-sciences (i.e. astronomy from astrology). A “good” theory should be easy to attack so we can try to refute it; and by constantly challenging it and failing to prove it wrong, we gain further confidence that this theory may,
indeed, be right. In short:

Often the most interesting theories can’t be proven right, they can only be proven wrong. By continuously refuting alternatives, a theory becomes stronger (but most likely never reaching the ‘truth’).

Answering the question in the first phrase of this section, hypothesis testing means verifying if a theory holds even when confronted with alternative theories. In statistical hypothesis testing, this often means checking if a hypothesis holds even when confronted with the fact that it may have just happened to be true by pure chance or plain luck.

Statistical hypothesis testing

Fisher (1925) also noted that we can’t always prove a theory but we can attempt to refute it. Therefore, statistical hypothesis testing includes stating a hypothesis, which is the hypothesis we are trying to invalidade; and check if we can confidently reject it by confronting it with data from a test or experiment. This hypothesis to be pickpocketed is often called the null hypothesis (commonly denoted H0). It receives this name as it is usually the hypothesis of no change: there is no difference, nothing changed after the experiment, there is no effect.

The hypotheses verified by statistical hypothesis tests are often theories about whether or not a random sample from a population comes from a given probability distribution. This seems weird, but several problems can be cast in this way. Suppose, for example, we would like to determine if students from a classroom have significantly different grades than students from another room. Any difference could
possibly be attributed to chance, as some students may just perform better on a exam because of luck.

An exam was applied to both classrooms. The exam results (by each student) are written below:

Classroom A Classroom B
8.12, 8.34, 7.54,
8.98, 8.24, 7.15,
6.60, 7.84, 8.68,
9.44, 8.83, 8.21,
8.83, 10.0, 7.94,
9.58, 9.44, 8.36,
8.48, 8.47, 8.02,
8.20, 10.0, 8.66,
8.48, 9.17, 6.54,
7.50
7.50, 6.70, 8.55,
7.84, 9.23, 6.10,
8.45, 8.27, 7.01,
7.18, 9.05, 8.18,
7.70, 7.93, 8.20,
8.19, 7.65, 9.25,
8.71, 8.34, 7.47,
7.47, 8.24, 7.10,
7.87, 10.0, 8.26,
6.82, 7.53
Students: 28
Mean: 8.416
Students: 29
Mean: 7.958

We have two hypothesis:

  • Results for classroom A are not significantly different from the results from classroom B. Any difference in means could have been explained due to chance alone.
  • Results are indeed different. The apparent differences are very unlikely to have occurred by chance.

Since we have less than 30 samples, we will be using a Two-Sample T-Test to test the hypothesis that the population’s mean values of the two samples are not equal. Besides, we will not be assuming equal variances. So let’s we create our test object:

And now we can query it:

Which reveals the test is indeed significant. And now we have at least two problems to address…


Problems

Problem 1: Statistical significance does not imply practical significance

So the test was significant. But would this mean the difference itself is significant?
Would this mean there any serious problem with the school teaching method?

No – but it doesn’t mean the contrary either. It is impossible to tell just by looking at the p-level.

The test only said there was a difference, but it can not tell the importance of this difference. Besides the two classes having performed so differently they could trigger statistical significance, we don’t know if this difference has any practical significance. A statistical test being significant is not a proof; it is just an evidence to be balanced together with other pieces of information in order to drawn a conclusion.

Perhaps one of best examples illustrating this problem is given by Martha K. Smith:

Suppose a large clinical trial is carried out to compare a new medical treatment with a standard one. The statistical analysis shows a statistically significant difference in lifespan when using the new treatment compared to the old one. But the increase in lifespan is at most three days, with average increase less than 24 hours, and with poor quality of life during the period of extended life. Most people would not consider the improvement practically significant.

In our classroom example, the difference in means is about 0.46 points. If principals believe a difference of less than 0.50 in a scale from 0.00 to 10.00 is not that critical, there may be no need to force students from the room with lower grades to start taking extra lessons after school. In other words, statistical hypothesis testing does not lead to automatic decision making. A statistically significant test is just another evidence which should be balanced with other clues in order to take a decision or
draw a conclusion
.

Problem 2: Powerless tests can be misleading

The p-level reported by the significance test is the probability of the extreme data we found be occurring given the null hypothesis is correct. Often, this is not the case. We
must also know the probability that the test will reject the null hypothesis when the null hypothesis is false. To do so, we must compute the power of our test, and, better yet, we should have used this information to conclude how many samples we would need to achieve a more informative test before we conducted our experiment. The power is then the probability of detecting a difference if this difference indeed exists (Smith, 2011). So let’s see:

The test we performed had astonishingly small power; so if the null hypothesis is false (and there is actually a difference between the classrooms) we have only about 50% chance of correctly rejecting it. Therefore, this would also lead to a 50% chance of producing a false negative – incorrectly saying there is no difference when actually there is.  The table below exemplifies the different errors we can get by rejecting or not the null hypothesis.

Null hypothesis is true Null hypothesis is false
Fail to reject the
null hypothesis
Correct
True negative
Type II error (beta)
False negative
Reject the null
hypothesis
Type I error (alpha)
False positive
Correct outcome
True positive

Tests with little statistical power are  often inconsistent in the literature. Suppose, for example, that the score from the first student from classroom B had earned a  7.52 instead of 7.50. Due to the low power of the test, this little change would already be sufficient to render the test nonsignificant, and we will not be able to reject the null hypothesis that the two population means aren’t different anymore. Due to the low power of the test, we can’t distinguish between a correct true negative and a type II error. This is why powerless tests can be misleading and should never be relied upon for decision making (Smith, 2011b).

The power of a test increases with the sample size. To obtain a power of at least 80%, let’s see how many samples should have been collected:

So, we would actually need 57 students in each classroom to confidently affirm whether there was an difference or not. Pretty disappointing, since in the real world we wouldn’t be able to enroll more students and wait years until we could perform another exam just to adjust the sample size. On those situations, the power for the test can be increased by increasing the significance threshold (Thomas, Juanes, 1996), although clearly sacrificing our ability to detect false positives (Type I errors) in the process.


My test turned insignificant. Should I accept the null hypothesis?

The short answer is ‘only if you have enough power‘. Otherwise, definitively no.

If you have reason to believe the test you performed had enough power to detect a difference within the given Type-II error rate, and it didn’t, then accepting the null would most likely be acceptable. The acceptance should also be accompanied of an analysis of confidence intervals or effect sizes. Consider, for example, that some actual scientific discoveries were  indeed made by accepting the null hypothesis rather than by contradicting it; one notable example being the discovery of the X-Ray (Yu, 2012).


Further criticism

Much of the criticism associated with statistical hypothesis testing is often related not to the use of statistical hypothesis testing per se, but on how the significance outcomes from such tests are interpreted. Often it boils down to incorrectly believing a p-value is the probability of a null hypothesis being true, when, in fact, it is only the probability of obtaining a test statistic as extreme as the one calculated from the data, always within the assumptions of the test under question.

Moreover, another problem may arise when we chose a null hypothesis which is obviously false. There is no point on hypothesis testing when the null hypothesis couldn’t possibly be true. For instance, it is very difficult to believe a parameter of a continuous distribution is exactly equal to an hypothesized value, such as zero. Given enough samples, it will always be possible to find a difference, as small as it gets, and the test will invariably turn significant. Under those specific circumstances, statistical testing can be useless as it has relationship to practical significance. That is why analyzing the effect size is important in order to determine the practical significance of an hypothesis test. Useful hypothesis would also need to be probable, plausible and falsifiable (Beaulieu-Prévost, 2005).

The following links also summarize much of the criticism in statistical hypothesis testing. The last one includes very interesting (if not enlightening) comments (in the comment section) on common criticisms of the hypothesis testing method.


The Accord.NET Framework

Now that we presented the statistical hypothesis testing framework, and now that the reader is aware of its drawbacks, we can start talking about performing those tests with the aid of a computer. The Accord.NET Framework is a framework for scientific computing with wide support for statistical and power analysis tests, without entering the merit if they are valid or no. In short, it provides some scissors;
feel free to run with them
.


Tests available in the framework (at the time of this writing)

As it may already have been noticed, the sample code included in the previous section was C# code using the framework. In the aforementioned example, we created a T-Test for comparing the population means of two samples drawn from Normal distributions. The framework, nevertheless, includes  many other tests, some with support for power analysis. Those include:

Parametric tests Nonparametric tests

Tests marked with a * are available in versions for one and two samples. Tests on the second row can be used to test hypothesis about contingency tables. Just remembering, the framework is open source and all code is available on GitHub.

A class diagram for the hypothesis testing module is shown in the picture below. Click for a larger version.

Class diagram for the Accord.Statistics.Testing namespace.

Framework usage should be rather simple. In order to illustrate it, the next section brings some example problems and their solution using the hypothesis testing module.

Example problems and solutions

Problem 1. Clairvoyant card game.

This is the second example from Wikipedia’s page on hypothesis testing. In this example, a person is tested for clairvoyance (ability of gaining information about something through extra sensory perception; detecting something without using the known human senses.

Problem 2. Worried insurance company

This is a common example with variations given by many sources. Some of them can be found here and here.

Problem 3. Differences among multiple groups (ANOVA)

This example comes from Wikipedia’s page on the F-test. Suppose we would like to study the effect of three different levels of a factor ona response (such as, for example, three levels of a fertilizer on plant growth. We have made 6 observations for each of the three levels a1, a2 and a3, and have written the results as in the table below.

The last line in the example shows the ANOVA table using the framework’s DataGridBox object. The DataGridBox is a convenience class for displaying DataGridViews just as one would display a message using MessageBox.
The table is shown below:

 Problem 4. Biased bees

This example comes from the stats page of the College of Saint Benedict and Saint John’s University (Kirkman, 1996). It is a very interesting example
as it shows a case in which a t-test fails to see a difference between the samples because of the non-normality of of the sample’s distributions. The Kolmogorov-Smirnov nonparametric test, on the other hand, succeeds.

The example deals with the preference of bees between two nearby blooming trees in an empty field. The experimenter has colelcted data measurinbg how much time does a bee spents near a particular tree. The time starts to be measured when a bee first touches the tree, and is stopped when the bee moves more than 1 meter far from it. The samples below represents the measured time, in seconds, of the observed
bees for each of the trees.

Problem 5. Comparing classifier performances

The last example comes from (E. Ientilucci, 2006) and deals with comparing the performance of two different raters (classifiers) to see if their performance are significantly different.

Suppose an experimenter has two classification systems, both trained to classify observations into one of 4 mutually exclusive categories. In order to measure the performance of each classifier, the experimenter confronted their classification labels with the ground truth for a testing dataset, writing the respective results in the form of contingency tables.

The hypothesis to be tested is that the performance of the two classifiers are the same.

In this case, the test didn’t show enough evidence to confidently reject the null hypothesis. Therefore, one should restrain from affirming anything about differences between the two systems, unless the power for the test is known.

Unfortunately I could not find a clear indication in the literature about the power of a two matrix Kappa test. However, since the test statistic is asymptotically normal, one would try checking the power for this test by analysis the power of the underlying Z-test. If there is enough power, one could possibly accept the null hypothesis that there are no large differences between the two systems.


Suggestions

As always, I expect the above discussion and examples could be useful for interested readers and users. However, if you believe you have found a flaw or would like to discuss any portion of this post, please feel free to do so by posting on the comments section.

PS: The classroom example uses a T-test to test for differences in populations means. The T-Test assumes a normal distribution. The data, however, it not exactly normal, since it is crippled between 0 and 10. Suggestions for a better example would also be appreciated!


References

R. A. Fisher, 1925. Statistical Methods for Research Workers. Available online from:
http://psychclassics.yorku.ca/Fisher/Methods/

M. K. Smith, 2011. Common mistakes in using statistics: Spotting and Avoiding Them – Power of a
Statistical Procedure. Available online from:
http://www.ma.utexas.edu/users/mks/statmistakes/power.html

M. K. Smith, 2011b. Common mistakes in using statistics: Spotting and Avoiding Them – Detrimental
Effects of Underpowered or Overpowered Studies. Available online from:
http://www.ma.utexas.edu/users/mks/statmistakes/UnderOverPower.html

L. Thomas, F. Juanes, 1996. The importance of statistical power analysis: an example from animal behaviour,
Animal Behaviour, Volume 52, Issue 4, October., Pages 856-859. Available online from: http://otg.downstate.edu/downloads/2007/Spring07/thomas.pdf

C. H. (Alex) Yu, 2012. Don’t believe in the null hypothesis? Available online from:
http://www.creative-wisdom.com/computer/sas/hypothesis.html

D. Beaulieu-Prévost, 2005. Statistical decision and falsification in science : going beyond the null
hypothesis. Séminaires CIC. Université de Montréal.

T.W. Kirkman, 1996. Statistics to Use. Acessed July 2012. Avalilable online from
http://www.physics.csbsju.edu/stats/

E. Ientilucci, 2006. “On Using and Computing the Kappa Statistic”. Available online from http://www.cis.rit.edu/~ejipci/Reports/On_Using_and_Computing_the_Kappa_Statistic.pdf