Discriminatory Power Analysis by Receiver-Operating Characteristic Curves (Part 2 of 2: C# Source Code)

roc-1_thumb-5B10-5D_thumb-5B1-5D

Part 2 of 2: C# Source Code. Go to Part 1 of 2: Theory.

Happy new year everyone! As promised, here is the second part of the article Performing Discriminatory Power Analysis by Receiver-Operating Characteristic Curves. This second part shows how to create ROC curves in C#. The following C# code implements the creation, visualization and analysis of ROC curves. The sample application accompanying the source code demonstrates how to use them inside your own applications.

 

Source code

Download source code and sample applications.

Confusion matrix

The ConfusionMatrix class represents a 2×2 contingency table. All derived measures mentioned above, such as sensitivity and specificity, are available through its properties and methods.

 

Receiver Operating Characteristic

The ReceiverOperatingCharacteristic class represents a ROC curve. The collection of points that define the curve are available through the Points property. Because points are ConfusionMatrix-inherited objects, the collection can be directly bound to a DataGridView, enabling quick curve visual inspection.

Additional information, such as the total Area Under the Curve and its associated deviation error are available through the Area and Error properties, respectively.

 

Using the code

Code usage is very simple. To use the aforementioned classes and create a ROC curve, simply create a new ReceiverOperatingCharacteristic object passing the actual data, as measured by the experiment, and the test data, as given by the prediction model.

The actual data must be a dichotomous variable, with only two valid values. The “false”  value is assumed to be the lowest value of the dichotomy, while the “true” value is assumed to be the highest. It is recommended to use 0 and 1 values for simplicity, although it isn’t mandatory.

The test data, as given by the prediction model, must have continuous values between the lowest and highest values for the actual data. For example, if the two valid values are 0 and 1, then its values must be inside the [0,1] range.

To compute the Curve using different cutoff values, call the Compute method passing the desired number of points or the desired increment in the cutoff value between points.

 

Sample applications

Together with the source code, there is an accompanying sample application demonstrating the use of the ROC analysis. To open the example table, click File –> Load then select the excel spreadsheet located inside the executable folder. To plot the curve, select the number of points or the threshold increment and click Plot.

roc-1_thumb[10]A “good” classifier.

 

roc-2_thumb[2] An approximately “random” classifier.

 

roc-3_thumb[3] A perfect, ideal classifier.

 

roc-4_thumb[3] Curve points for the “good” classifier.

 

Further Reading

  • Receiver Operating Curves: An Introduction
    Excellent page about ROC curves and its applications. Includes excellent applets for experimentation with the curves, allowing for better understanding of its workings and meaning.

  • BBC NEWS MAGAZINE, A scanner to detect terrorists; Very interesting paper about how statistics are usually wrongly interpreted when published by the media. “To find one terrorist in 3000 people, using a screen that works 90% of the time, you’ll end up detaining 300 people, one of whom might be your target”. Written by Michael Blastland.

 

References

7 Comments

  1. Thanks for posting the grid. I have lots of data from imaging experiments but could know way to get them into a ROC. Just seeing the left column as cutoff level made it so obvious. Cheers.

  2. Hi Cesar, thank you for publishing your code, it looks very well done. I use Matlab to compute ROC areas for clinical diagnostics, but could use something like this as well. The data generally is in the form, “0” for non disease and “1” for disease patient samples. We are looking at the concentration of biomarkers in blood to determine clinical utility. Since it is concentration values, the ranges are quite varied. I am curious as to why you limit the continuous values to be between the highest an lowest values of the actual data. Those values could have just as easily been designated as “Normal” and “Cancerous” since they really are just nominal.

    Thanks,
    Brian –
    San Diego, California

  3. Hi Brian,

    I’ve used the lowest and highest values of the actual data because sometimes my data is organized as 1 for positive results and -1 for negative results, or even sometimes as 0.5 for positive results and -0.5 for negative results. This comes from a machine learning perspective where we commonly deal with dichotomies in the form [-1;1]. Surely the output could be normalized to 0 and 1 before creating the ROC, but at the time it made sense to proceed this way.

    The values are confined to the range of the dichotomy because the test is expected to give, after all, either if a patient is “Normal” or “Cancerous”. If 0 means “Normal” and 1 means “Cancerous”, the test could give any value between 0 or 1 as a likelihood or as confidence measure. If you are dealing with different ranges, you could try scaling your results’ range to the actual data range before creating the ROC curve and then scaling back your cutoff values. It depends on how you are interpreting your test results.

    I’m not a clinician myself, so I’m not very sure how this is usually done in this field. This code was originally created to compare machine learning techniques which would try to produce results in the same range as the actual values.

    Perhaps I’ll update the code sometime later so it can be directly used with arbitrary ranges.

    Regards,
    César

  4. Hi

    I think the code in method Compute(double increment) should be like this:

    // Create the curve, computing a point for each cutoff value
    for (cutoff = dfalse; cutoff < dtrue; cutoff += increment)
    {
    points.Add(ComputePoint(cutoff));
    }
    if (cutoff >= dtrue) points.Add(ComputePoint(dtrue + 0.1));

    Otherwise it doesn’t have point (0, 0) in ROC curve.

Leave a Reply

Your email address will not be published. Required fields are marked *