The « Receiver Operating Characteristic » function (ROC function) is a measure of the performance of a binary classifier.
In the following, we consider a dataset of elements split into two sets ‘0’ and ‘1’ : an element belonging to the set x in the dataset is written « x-element ». A classifier C classifies elements into the two classes ‘0’ and ‘1’ : an element classified in the set x being written « x-classified element ».
Graphically, the ROC function is represented in the form of a curve which gives the rate of true positives (fraction of the 1-elements that are 1-classified : correctly) as a function of the false-positive rate (fraction of 0-elements that are 1-classified : incorrectly).
ROC curves were invented during the Second World War to show the separation between radar signals and background noise. The interest of the ROC curve was emphasized in 1960. Since then, this statistical tool has been used in particular in the pharmaceutical field, radiology, biology, epidemiology and recently in machine learning.
ROC curves show the progress made with a binary classifier when the discrimination threshold varies. The sensitivity is given by the fraction of 1-elements that are correctly 1-classified (rate of true positives), and the antispecificity (1 minus the specificity) by the fraction of 0-elements that are incorrectly 1-classified (rate of false positives). Antispecificity is associated to the x-axis and sensitivity to the y-axis to form the ROC diagram. Each value on the x-axis will provide a point on the ROC curve, which will range from (0, 0) to (1, 1) :
At (0, 0) the classifier always 0-classifies (returns ‘negative’) : there is no incorrectly 1-classified 0-elements (i.e. no false positives), but also no correctly 1-classified 1-elements (i.e. no true positives). In this case, a data subset to compute the ROC curve point is chosen such that the confusion matrix gives FP=0 (no false positives) and TP=0 (no true positives) as shown in the figure below :
Recalling that sensitivity = TP/(TP + FN) and antispecificity = 1 – specificity = 1 – TN/(TN + FP) = FP/(TN + FP), in this case we get sensitivity = 0/FN = 0 and antispecificity = 0/TN = 0 ; which leads to a (0, 0) point in the ROC curve.
At (1, 1) the classifier always 1-classifies (returns ‘positive’): there is no correctly 0-classified 0-elements (i.e. no true negatives), but also no incorrectly 1-classified 0-elements (i.e. no false negatives). In this case, a data subset to compute the ROC curve point is chosen such that the confusion matrix gives TN=0 (no true negatives) and FN=0 (no false negatives) as shown in the figure below :
In this case we get sensitivity = TP/(TP + FN) = TP/(TP + 0) = 1 and antispecificity = FP/(TN + FP) = FP/(0 + FP) = 1 ; which leads to a (1, 1) point in the ROC curve.
A random classifier will draw a line from (0, 0) to (1, 1), as shown below :
As shown in the figure below, a perfect or ideal classifier will draw a rectilinear curve going vertically from (0, 0) to (0, 1) then horizontally from (0, 0) to (1, 1) with an orthogonal angle in (0, 1) meaning that the classifier gives no false positives nor false negatives, and is therefore perfectly accurate, never deceiving.
As shown in the figure below, a false classifier will draw a straight line curve from (0,0) to (1, 0) and then vertically from (1, 0) to (1, 1) with an orthogonal angle in (1, 0) meaning that the classifier gives no real negative nor no real positive, and is therefore perfectly inaccurate, always wrong. Just invert his prediction to make it a perfectly accurate classifier.
As shown in the figure below, the AUR (area under ROC) value measures the area under the ROC curve. The larger the area, the more the curve moves away from the random classifier line and approaches the orthogonal angle of the perfect classifier.
USE CASE : EVALUATING A CLASSIFIER IN PYTHON WITH THE ROC CURVE
In the following we evaluate with the ROC curve the Random Forest classifier created here with a dataset about distribution of big salaries.
First we create the classifier with the following code :
import numpy as np
import pandas as pd
from matplotlib import cm
#loading the dataset
dataset = pd.read_csv('dataset-4.csv')
#X = dataset.iloc[:,0:6].values
X = dataset.iloc[:,0:3].values
y = dataset.iloc[:,len(dataset.iloc)-1].values
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)
#fitting the classifier to the training set
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 100, criterion='entropy', random_state=0)
Then we create the ROC Curve with the following code :
y_pred_proba = classifier.predict_proba(X=X_test)
The ‘roccurve’ function that builds and displays the ROC curve of the classifier is defined as follows :
import matplotlib.pyplot as plt
from sklearn import metrics
def roccurve(y_values, y_preds_proba):
fpr, tpr, _ = metrics.roc_curve(y_values, y_preds_proba)
xx = np.arange(101) / float(100)
aur = metrics.auc(fpr,tpr)
plt.plot([0.0, 0.0], [0.0, 1.0], color='green', linewidth=8)
plt.plot([0.0, 1.0], [1.0, 1.0], color='green', label='Perfect Model', linewidth=4)
plt.plot(xx,xx, color='blue', label='Random Model')
plt.plot(fpr,tpr, color='red', label='User Model')
plt.title("ROC Curve - AUR value ="+str(aur))
plt.xlabel('% false positives')
plt.ylabel('% true positives')
The execution of this code leads to the following graph :
The shape of the ROC curve and the AUR value close to 90% show that the performance of the model is pretty good.