Part 2 of 2: C# Source Code. Go to Part 1 of 2: Theory.
Happy new year everyone! As promised, here is the second part of the article Performing Discriminatory Power Analysis by Receiver-Operating Characteristic Curves. This second part shows how to create ROC curves in C#. The following C# code implements the creation, visualization and analysis of ROC curves. The sample application accompanying the source code demonstrates how to use them inside your own applications.
Source code
Download source code and sample applications.
Confusion matrix
The ConfusionMatrix class represents a 2×2 contingency table. All derived measures mentioned above, such as sensitivity and specificity, are available through its properties and methods.
1 |
<span>/// <summary></span><br /><span>/// Confusion Matrix class</span><br /><span>/// </summary></span><br /><span>public</span> <span>class</span> ConfusionMatrix<br />{<br /><br /> <span>// 2x2 confusion matrix</span><br /> <span>private</span> <span>int</span> truePositives;<br /> <span>private</span> <span>int</span> trueNegatives;<br /> <span>private</span> <span>int</span> falsePositives;<br /> <span>private</span> <span>int</span> falseNegatives;<br /><br /><br /> <span>/// <summary></span><br /> <span>/// Constructs a new Confusion Matrix.</span><br /> <span>/// </summary></span><br /> <span>public</span> ConfusionMatrix(<span>int</span> truePositives, <span>int</span> trueNegatives,<br /> <span>int</span> falsePositives, <span>int</span> falseNegatives)<br /> {<br /> <span>this</span>.truePositives = truePositives;<br /> <span>this</span>.trueNegatives = trueNegatives;<br /> <span>this</span>.falsePositives = falsePositives;<br /> <span>this</span>.falseNegatives = falseNegatives;<br /> }<br /><br /><br /><br /> <span>/// <summary></span><br /> <span>/// Gets the number of observations for this matrix</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>int</span> Observations<br /> {<br /> get<br /> {<br /> <span>return</span> trueNegatives + truePositives +<br /> falseNegatives + falsePositives;<br /> }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Gets the number of actual positives</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>int</span> ActualPositives<br /> {<br /> get { <span>return</span> truePositives + falseNegatives; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Gets the number of actual negatives</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>int</span> ActualNegatives<br /> {<br /> get { <span>return</span> trueNegatives + falsePositives; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Gets the number of predicted positives</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>int</span> PredictedPositives<br /> {<br /> get { <span>return</span> truePositives + falsePositives; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Gets the number of predicted negatives</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>int</span> PredictedNegatives<br /> {<br /> get { <span>return</span> trueNegatives + falseNegatives; }<br /> }<br /><br /><br /><br /> <span>/// <summary></span><br /> <span>/// Cases correctly identified by the system as positives.</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>int</span> TruePositives<br /> {<br /> get { <span>return</span> truePositives; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Cases correctly identified by the system as negatives.</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>int</span> TrueNegatives<br /> {<br /> get { <span>return</span> trueNegatives; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Cases incorrectly identified by the system as positives.</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>int</span> FalsePositives<br /> {<br /> get { <span>return</span> falsePositives; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Cases incorrectly identified by the system as negatives.</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>int</span> FalseNegatives<br /> {<br /> get { <span>return</span> falseNegatives; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Sensitivity, also known as True Positive Rate</span><br /> <span>/// </summary></span><br /> <span>/// <remarks></span><br /> <span>/// Sensitivity = TPR = TP / (TP + FN)</span><br /> <span>/// </remarks></span><br /> <span>public</span> <span>double</span> Sensitivity<br /> {<br /> get { <span>return</span> (<span>double</span>)truePositives / (truePositives + falseNegatives); }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Specificity, also known as True Negative Rate</span><br /> <span>/// </summary></span><br /> <span>/// <remarks></span><br /> <span>/// Specificity = TNR = TN / (FP + TN)</span><br /> <span>/// or also as: TNR = (1-False Positive Rate)</span><br /> <span>/// </remarks></span><br /> <span>public</span> <span>double</span> Specificity<br /> {<br /> get { <span>return</span> (<span>double</span>)trueNegatives / (trueNegatives + falsePositives); }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Efficiency, the arithmetic mean of sensitivity and specificity</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>double</span> Efficiency<br /> {<br /> get { <span>return</span> (Sensitivity + Specificity) / 2.0; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Accuracy, or raw performance of the system</span><br /> <span>/// </summary></span><br /> <span>/// <remarks></span><br /> <span>/// ACC = (TP + TN) / (P + N)</span><br /> <span>/// </remarks></span><br /> <span>public</span> <span>double</span> Accuracy<br /> {<br /> get<br /> {<br /> <span>return</span> 1.0 * (truePositives + trueNegatives) / Observations;<br /> }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Positive Predictive Value, also known as Positive Precision</span><br /> <span>/// </summary></span><br /> <span>/// <remarks></span><br /> <span>/// The Positive Predictive Value tells us how likely is </span><br /> <span>/// that a patient has a disease, given that the test for</span><br /> <span>/// this disease is positive.</span><br /> <span>/// </span><br /> <span>/// It can be calculated as: PPV = TP / (TP + FP)</span><br /> <span>/// </remarks></span><br /> <span>public</span> <span>double</span> PositivePredictiveValue<br /> {<br /> get<br /> {<br /> <span>double</span> f = truePositives + FalsePositives;<br /> <span>if</span> (f != 0) <span>return</span> truePositives / f;<br /> <span>return</span> 1.0;<br /> }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Negative Predictive Value, also known as Negative Precision</span><br /> <span>/// </summary></span><br /> <span>/// <remarks></span><br /> <span>/// The Negative Predictive Value tells us how likely it is</span><br /> <span>/// that the disease is NOT present for a patient, given that</span><br /> <span>/// the patient's test for the disease is negative.</span><br /> <span>/// </span><br /> <span>/// It can be calculated as: NPV = TN / (TN + FN)</span><br /> <span>/// </remarks></span><br /> <span>public</span> <span>double</span> NegativePredictiveValue<br /> {<br /> get<br /> {<br /> <span>double</span> f = (trueNegatives + falseNegatives);<br /> <span>if</span> (f != 0) <span>return</span> trueNegatives / f;<br /> <span>else</span> <span>return</span> 1.0;<br /> }<br /> }<br /><br /><br /> <span>/// <summary></span><br /> <span>/// False Positive Rate, also known as false alarm rate.</span><br /> <span>/// </summary></span><br /> <span>/// <remarks></span><br /> <span>/// It can be calculated as: FPR = FP / (FP + TN)</span><br /> <span>/// or also as: FPR = (1-specifity)</span><br /> <span>/// </remarks></span><br /> <span>public</span> <span>double</span> FalsePositiveRate<br /> {<br /> get<br /> {<br /> <span>return</span> (<span>double</span>)falsePositives / (falsePositives + trueNegatives);<br /> }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// False Discovery Rate, or the expected false positive rate.</span><br /> <span>/// </summary></span><br /> <span>/// <remarks></span><br /> <span>/// The False Discovery Rate is actually the expected false positive rate.</span><br /> <span>/// </span><br /> <span>/// For example, if 1000 observations were experimentally predicted to</span><br /> <span>/// be different, and a maximum FDR for these observations was 0.10, then</span><br /> <span>/// 100 of these observations would be expected to be false positives.</span><br /> <span>/// </span><br /> <span>/// It is calculated as: FDR = FP / (FP + TP)</span><br /> <span>/// </remarks></span><br /> <span>public</span> <span>double</span> FalseDiscoveryRate<br /> {<br /> get<br /> {<br /> <span>double</span> d = falsePositives + truePositives;<br /> <span>if</span> (d != 0.0) <span>return</span> falsePositives / d;<br /> <span>else</span> <span>return</span> 1.0;<br /> }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Matthews Correlation Coefficient, also known as Phi coefficient</span><br /> <span>/// </summary></span><br /> <span>/// <remarks></span><br /> <span>/// A coefficient of +1 represents a perfect prediction, 0 an</span><br /> <span>/// average random prediction and −1 an inverse prediction.</span><br /> <span>/// </remarks></span><br /> <span>public</span> <span>double</span> MatthewsCorrelationCoefficient<br /> {<br /> get<br /> {<br /> <span>double</span> s = System.Math.Sqrt(<br /> (truePositives + falsePositives) *<br /> (truePositives + falseNegatives) *<br /> (trueNegatives + falsePositives) *<br /> (trueNegatives + falseNegatives));<br /><br /> <span>if</span> (s != 0.0)<br /> <span>return</span> (truePositives * trueNegatives) / s;<br /> <span>else</span> <span>return</span> 0.0;<br /> }<br /> }<br />} |
Receiver Operating Characteristic
The ReceiverOperatingCharacteristic class represents a ROC curve. The collection of points that define the curve are available through the Points property. Because points are ConfusionMatrix-inherited objects, the collection can be directly bound to a DataGridView, enabling quick curve visual inspection.
Additional information, such as the total Area Under the Curve and its associated deviation error are available through the Area and Error properties, respectively.
1 |
<span>/// <summary></span><br /><span>/// Receiver Operating Characteristic (ROC) Curve</span><br /><span>/// </summary></span><br /><span>/// <remarks></span><br /><span>/// In signal detection theory, a receiver operating characteristic (ROC), or simply</span><br /><span>/// ROC curve, is a graphical plot of the sensitivity vs. (1 − specificity) for a </span><br /><span>/// binary classifier system as its discrimination threshold is varied. </span><br /><span>/// </span><br /><span>/// References: </span><br /><span>/// http://en.wikipedia.org/wiki/Receiver_operating_characteristic</span><br /><span>/// http://www.anaesthetist.com/mnm/stats/roc/Findex.htm</span><br /><span>/// http://radiology.rsna.org/content/148/3/839.full.pdf</span><br /><span>/// </remarks></span><br /><span>public</span> <span>class</span> ReceiverOperatingCharacteristic<br />{<br /><br /> <span>private</span> <span>double</span> area = 0.0;<br /> <span>private</span> <span>double</span> error = 0.0;<br /><br /><br /> <span>// The actual, measured data</span><br /> <span>private</span> <span>double</span>[] measurement;<br /><br /> <span>// The data, as predicted by a test</span><br /> <span>private</span> <span>double</span>[] prediction;<br /><br /><br /> <span>// The real number of positives and negatives in the measured (actual) data</span><br /> <span>private</span> <span>int</span> positiveCount;<br /> <span>private</span> <span>int</span> negativeCount;<br /><br /> <span>// The values which represent positive and negative values in our</span><br /> <span>// measurement data (such as presence or absence of some disease)</span><br /> <span>double</span> dtrue;<br /> <span>double</span> dfalse;<br /><br /> <span>// The collection to hold our curve point information</span><br /> <span>private</span> PointCollection collection;<br /><br /><br /><br /> <span>/// <summary></span><br /> <span>/// Constructs a new Receiver Operating Characteristic model</span><br /> <span>/// </summary></span><br /> <span>/// <param name="output">An array of binary values. Tipically 0 and 1, or -1 and 1, indicating negative and positive cases, respectively.</param></span><br /> <span>/// <param name="predictedOutput">An array of continuous values trying to approximate the measurement array.</param></span><br /> <span>public</span> ReceiverOperatingCharacteristic(<span>double</span>[] measurement, <span>double</span>[] prediction)<br /> {<br /> <span>this</span>.measurement = measurement;<br /> <span>this</span>.prediction = prediction;<br /><br /> <span>// Determine which numbers correspont to each binary category</span><br /> dtrue = dfalse = measurement[0];<br /> <span>for</span> (<span>int</span> i = 1; i < measurement.Length; i++)<br /> {<br /> <span>if</span> (dtrue < measurement[i])<br /> dtrue = measurement[i];<br /> <span>if</span> (dfalse > measurement[i])<br /> dfalse = measurement[i];<br /> }<br /><br /> <span>// Count the real number of positive and negative cases</span><br /> <span>for</span> (<span>int</span> i = 0; i < measurement.Length; i++)<br /> {<br /> <span>if</span> (measurement[i] == dtrue)<br /> <span>this</span>.positiveCount++;<br /> }<br /><br /> <span>// Negative cases is just the number of cases minus the number of positives</span><br /> <span>this</span>.negativeCount = <span>this</span>.measurement.Length - <span>this</span>.positiveCount;<br /> }<br /><br /><br /><br /> <span>#region</span> Properties<br /> <span>/// <summary></span><br /> <span>/// Gets the points of the curve.</span><br /> <span>/// </summary></span><br /> <span>public</span> PointCollection Points<br /> {<br /> get { <span>return</span> collection; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Gets the number of actual positive cases.</span><br /> <span>/// </summary></span><br /> <span>internal</span> <span>int</span> Positives<br /> {<br /> get { <span>return</span> positiveCount; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Gets the number of actual negative cases.</span><br /> <span>/// </summary></span><br /> <span>internal</span> <span>int</span> Negatives<br /> {<br /> get { <span>return</span> negativeCount; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Gets the number of cases (observations) being analyzed.</span><br /> <span>/// </summary></span><br /> <span>internal</span> <span>int</span> Observations<br /> {<br /> get { <span>return</span> <span>this</span>.measurement.Length; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// The area under the ROC curve. Also known as AUC-ROC.</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>double</span> Area<br /> {<br /> get { <span>return</span> area; }<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Calculates the Standard Error associated with this ROC curve.</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>double</span> Error<br /> {<br /> get { <span>return</span> error; }<br /> }<br /> <span>#endregion</span><br /><br /><br /> <span>#region</span> Public Methods<br /> <span>/// <summary></span><br /> <span>/// Computes a n-points ROC curve.</span><br /> <span>/// </summary></span><br /> <span>/// <remarks></span><br /> <span>/// Each point in the ROC curve will have a threshold increase of</span><br /> <span>/// 1/npoints over the previous point, starting at zero.</span><br /> <span>/// </remarks></span><br /> <span>/// <param name="points">The number of points for the curve.</param></span><br /> <span>public</span> <span>void</span> Compute(<span>int</span> points)<br /> {<br /> Compute((dtrue - dfalse) / points);<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Computes a ROC curve with 1/increment points</span><br /> <span>/// </summary></span><br /> <span>/// <param name="increment">The increment over the previous point for each point in the curve.</param></span><br /> <span>public</span> <span>void</span> Compute(<span>double</span> increment)<br /> {<br /> List<Point> points = <span>new</span> List<Point>();<br /> <span>double</span> cutoff;<br /><br /> <span>// Create the curve, computing a point for each cutoff value</span><br /> <span>for</span> (cutoff = dfalse; cutoff <= dtrue; cutoff += increment)<br /> {<br /> points.Add(ComputePoint(cutoff));<br /> }<br /> <span>if</span> (cutoff < dtrue) points.Add(ComputePoint(dtrue));<br /><br /> <span>// Sort the curve by descending specificity</span><br /> points.Sort(<span>new</span> Comparison<Point>(<span>delegate</span>(Point a, Point b) <br /> {<br /> <span>return</span> a.Specificity.CompareTo(b.Specificity);<br /> }<br /> ));<br /> <br /> <span>// Create the point collection</span><br /> <span>this</span>.collection = <span>new</span> PointCollection(points.ToArray());<br /><br /> <span>// Calculate area and error associated with this curve</span><br /> <span>this</span>.area = calculateAreaUnderCurve();<br /> <span>this</span>.error = calculateStandardError();<br /> }<br /><br /><br /> <span>public</span> Point ComputePoint(<span>double</span> threshold)<br /> {<br /> <span>int</span> truePositives = 0;<br /> <span>int</span> trueNegatives = 0;<br /><br /> <span>for</span> (<span>int</span> i = 0; i < <span>this</span>.measurement.Length; i++)<br /> {<br /> <span>bool</span> measured = (<span>this</span>.measurement[i] == dtrue);<br /> <span>bool</span> predicted = (<span>this</span>.prediction[i] >= threshold);<br /><br /><br /> <span>// If the prediction equals the true measured value</span><br /> <span>if</span> (predicted == measured)<br /> {<br /> <span>// We have a hit. Now we have to see</span><br /> <span>// if it was a positive or negative hit</span><br /> <span>if</span> (predicted == <span>true</span>)<br /> truePositives++; <span>// Positive hit</span><br /> <span>else</span> trueNegatives++;<span>// Negative hit</span><br /> }<br /> }<br /><br /><br /><br /> <span>// The other values can be computed from available variables</span><br /> <span>int</span> falsePositives = negativeCount - trueNegatives;<br /> <span>int</span> falseNegatives = positiveCount - truePositives;<br /><br /> <span>return</span> <span>new</span> Point(<span>this</span>,threshold,<br /> truePositives, trueNegatives,<br /> falsePositives, falseNegatives);<br /> }<br /><br /><br /> <span>/// <summary></span><br /> <span>/// Compares two ROC curves.</span><br /> <span>/// </summary></span><br /> <span>/// <param name="r">The amount of correlation between the two curves</param></span><br /> <span>/// <returns></returns></span><br /> <span>public</span> <span>double</span> Compare(ReceiverOperatingCharacteristic curve, <span>double</span> r)<br /> {<br /> <span>// Areas</span><br /> <span>double</span> AUC1 = <span>this</span>.Area;<br /> <span>double</span> AUC2 = curve.Area;<br /><br /> <span>// Errors</span><br /> <span>double</span> se1 = <span>this</span>.Error;<br /> <span>double</span> se2 = curve.Error;<br /><br /> <span>// Standard error</span><br /> <span>return</span> (AUC1 - AUC2) / System.Math.Sqrt(se1 * se1 + se2 * se2 - 2 * r * se1 * se2);<br /> }<br /> <span>#endregion</span><br /><br /><br /> <span>#region</span> Private Methods<br /> <span>/// <summary></span><br /> <span>/// Calculates the area under the ROC curve using the trapezium method</span><br /> <span>/// </summary></span><br /> <span>private</span> <span>double</span> calculateAreaUnderCurve()<br /> {<br /> <span>double</span> sum = 0.0;<br /> <span>double</span> tpz = 0.0;<br /><br /> <span>for</span> (<span>int</span> i = 0; i < collection.Count - 1; i++)<br /> {<br /> <span>// Obs: False Positive Rate = (1-specificity)</span><br /> tpz = collection[i].Sensitivity + collection[i + 1].Sensitivity;<br /> tpz = tpz * (collection[i].FalsePositiveRate - collection[i + 1].FalsePositiveRate) / 2.0;<br /> sum += tpz;<br /> }<br /> <span>return</span> sum;<br /> }<br /><br /> <span>/// <summary></span><br /> <span>/// Calculates the standard error associated with this curve</span><br /> <span>/// </summary></span><br /> <span>private</span> <span>double</span> calculateStandardError()<br /> {<br /> <span>double</span> A = area;<br /><br /> <span>// real positive cases</span><br /> <span>int</span> Na = positiveCount;<br /><br /> <span>// real negative cases</span><br /> <span>int</span> Nn = negativeCount;<br /><br /> <span>double</span> Q1 = A / (2.0 - A);<br /> <span>double</span> Q2 = 2 * A * A / (1.0 + A);<br /><br /> <span>return</span> System.Math.Sqrt((A * (1.0 - A) +<br /> (Na - 1.0) * (Q1 - A * A) +<br /> (Nn - 1.0) * (Q2 - A * A)) / (Na * Nn));<br /> }<br /> <span>#endregion</span><br /><br /><br /><br /> <span>#region</span> Nested Classes<br /><br /> <span>/// <summary></span><br /> <span>/// Object to hold information about a Receiver Operating Characteristic Curve Point</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>class</span> Point : ConfusionMatrix<br /> {<br /><br /> <span>// Discrimination threshold (cutoff value)</span><br /> <span>private</span> <span>double</span> cutoff;<br /><br /> <span>// Parent curve</span><br /> ReceiverOperatingCharacteristic curve;<br /><br /> <span>/// <summary></span><br /> <span>/// Constructs a new Receiver Operating Characteristic point.</span><br /> <span>/// </summary></span><br /> <span>internal</span> Point(ReceiverOperatingCharacteristic curve, <span>double</span> cutoff,<br /> <span>int</span> truePositives, <span>int</span> trueNegatives, <span>int</span> falsePositives, <span>int</span> falseNegatives)<br /> : <span>base</span>(truePositives, trueNegatives, falsePositives, falseNegatives)<br /> {<br /> <span>this</span>.curve = curve;<br /> <span>this</span>.cutoff = cutoff;<br /> }<br /><br /><br /> <span>/// <summary></span><br /> <span>/// Gets the cutoff value (discrimination threshold) for this point.</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>double</span> Cutoff<br /> {<br /> get { <span>return</span> cutoff; }<br /> }<br /> }<br /><br /><br /> <span>/// <summary></span><br /> <span>/// Represents a Collection of Receiver Operating Characteristic (ROC) Curve points.</span><br /> <span>/// This class cannot be instantiated.</span><br /> <span>/// </summary></span><br /> <span>public</span> <span>class</span> PointCollection : ReadOnlyCollection<Point><br /> {<br /> <span>internal</span> PointCollection(Point[] points)<br /> : <span>base</span>(points)<br /> {<br /> }<br /><br /> }<br /> <span>#endregion</span><br /><br />} |
Using the code
Code usage is very simple. To use the aforementioned classes and create a ROC curve, simply create a new ReceiverOperatingCharacteristic object passing the actual data, as measured by the experiment, and the test data, as given by the prediction model.
The actual data must be a dichotomous variable, with only two valid values. The “false” value is assumed to be the lowest value of the dichotomy, while the “true” value is assumed to be the highest. It is recommended to use 0 and 1 values for simplicity, although it isn’t mandatory.
The test data, as given by the prediction model, must have continuous values between the lowest and highest values for the actual data. For example, if the two valid values are 0 and 1, then its values must be inside the [0,1] range.
To compute the Curve using different cutoff values, call the Compute method passing the desired number of points or the desired increment in the cutoff value between points.
1 2 3 4 5 |
<span>// Creates the Receiver Operating Curve of the given source</span> rocCurve = <span>new</span> ReceiverOperatingCharacteristic(realData, testData); <span>// Compute the ROC curve with 20 points</span> rocCurve.Compute(20); |
Sample applications
Together with the source code, there is an accompanying sample application demonstrating the use of the ROC analysis. To open the example table, click File –> Load then select the excel spreadsheet located inside the executable folder. To plot the curve, select the number of points or the threshold increment and click Plot.
An approximately “random” classifier.
Curve points for the “good” classifier.
Further Reading
-
Receiver Operating Curves: An Introduction
Excellent page about ROC curves and its applications. Includes excellent applets for experimentation with the curves, allowing for better understanding of its workings and meaning. -
BBC NEWS MAGAZINE, A scanner to detect terrorists; Very interesting paper about how statistics are usually wrongly interpreted when published by the media. “To find one terrorist in 3000 people, using a screen that works 90% of the time, you’ll end up detaining 300 people, one of whom might be your target”. Written by Michael Blastland.
References
-
WIKIPEDIA, The Free Encyclopedia, “Receiver Operating Characteristic”,
Available in: <http://en.wikipedia.org/wiki/Receiver_operating_characteristic>
Visited in: 07 jul. 2009. -
SABATTINI, R. M. E.; “Um Programa para o Cálculo da Acurácia, Especificidade e Sensibilidade de Testes Médicos”; Revista Informédica, 2 (12): 19-21, 1995.
Available in: <http://www.informaticamedica.org.br/informed/sensib.htm>
Visited in: 07 jul. 2009. -
ANAESTHESTIST.COM, “Receiver Operating Curves: An Introduction”,
Available in: <http://www.anaesthetist.com/mnm/stats/roc/Findex.htm>
Visited in: 13 jul. 2009.
Hi,
This artical is very useful for me. I am a .NET developer and always looking to
learn something new. I would like to introduce another good C# blog, Have a look.
http://CSharpTalk.com
Sonam
Thanks for posting the grid. I have lots of data from imaging experiments but could know way to get them into a ROC. Just seeing the left column as cutoff level made it so obvious. Cheers.
Hi Cesar, thank you for publishing your code, it looks very well done. I use Matlab to compute ROC areas for clinical diagnostics, but could use something like this as well. The data generally is in the form, “0” for non disease and “1” for disease patient samples. We are looking at the concentration of biomarkers in blood to determine clinical utility. Since it is concentration values, the ranges are quite varied. I am curious as to why you limit the continuous values to be between the highest an lowest values of the actual data. Those values could have just as easily been designated as “Normal” and “Cancerous” since they really are just nominal.
Thanks,
Brian –
San Diego, California
Hi Brian,
I’ve used the lowest and highest values of the actual data because sometimes my data is organized as 1 for positive results and -1 for negative results, or even sometimes as 0.5 for positive results and -0.5 for negative results. This comes from a machine learning perspective where we commonly deal with dichotomies in the form [-1;1]. Surely the output could be normalized to 0 and 1 before creating the ROC, but at the time it made sense to proceed this way.
The values are confined to the range of the dichotomy because the test is expected to give, after all, either if a patient is “Normal” or “Cancerous”. If 0 means “Normal” and 1 means “Cancerous”, the test could give any value between 0 or 1 as a likelihood or as confidence measure. If you are dealing with different ranges, you could try scaling your results’ range to the actual data range before creating the ROC curve and then scaling back your cutoff values. It depends on how you are interpreting your test results.
I’m not a clinician myself, so I’m not very sure how this is usually done in this field. This code was originally created to compare machine learning techniques which would try to produce results in the same range as the actual values.
Perhaps I’ll update the code sometime later so it can be directly used with arbitrary ranges.
Regards,
César
Hi
I think the code in method Compute(double increment) should be like this:
// Create the curve, computing a point for each cutoff value
for (cutoff = dfalse; cutoff < dtrue; cutoff += increment)
{
points.Add(ComputePoint(cutoff));
}
if (cutoff >= dtrue) points.Add(ComputePoint(dtrue + 0.1));
Otherwise it doesn’t have point (0, 0) in ROC curve.
Hi,
Thanks for the suggestion. I have added an optional method parameter to include point (0,0) if does not have been included already. I’ll update the article soon.
The code will also be available in the next version of the Accord.NET Framework.
Thanks,
César