Simple guide on how to generate ROC plot for Keras classifier

(Comments)

categories

After reading the guide, you will know how to evaluate a Keras classifier by ROC and AUC:

  • Produce ROC plots for binary classification classifiers; apply cross-validation in doing so.
  • Calculate AUC and use that to compare classifiers performance.
  • Apply ROC analysis to multi-class classification. Create ROC for evaluating individual class and the overall classification performance.

What are ROC and AUC and what can they do?

What are they?

From Wikipedia: Receiver operating characteristic curve a.k.a ROC is a graphic plot illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The critical point here is "binary classifier" and "varying threshold". I will show you how to plot ROC for multi-label classifier by the one-vs-all approach as well.

Area Under the Curve, a.k.a. AUC is the percentage of this area that is under this ROC curve, ranging between 0~1.  It 

What can they do?

ROC is a great way to visualize the performance of a binary classifier, and AUC is one single number to summarize a classifier's performance by assessing the ranking regarding separation of the two classes. The higher, the better.

In the following two sections, I will show you how to plot the ROC and calculate the AUC for Keras classifiers, both binary and multi-label ones.

ROC, AUC for binary classifiers

First, let's use Sklearn's make_classification() function to generate some train/test data.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=80000)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

X_train, X_train_lr, y_train, y_train_lr = train_test_split(X_train,
                                                            y_train,
                                                            test_size=0.5)

Next, let's build and train a Keras classifier model as usual.

from keras.models import Sequential
from keras.layers import Dense

def build_model():
    model = Sequential()
    model.add(Dense(20, input_dim=20, activation='relu'))
    model.add(Dense(40, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

from keras.wrappers.scikit_learn import KerasClassifier
keras_model = build_model()
keras_model.fit(X_train, y_train, epochs=5, batch_size=100, verbose=1)

We then call model.predict on the reserved test data to generate the probability values. After that, use the probabilities and ground true labels to generate two data array pairs necessary to plot ROC curve:

  • fpr: False positive rates for each possible threshold
  • tpr: True positive ratefor each possible threshold

We can call sklearn's roc_curve() function to generate the two. Here is the code to make them happen.

from sklearn.metrics import roc_curve
y_pred_keras = keras_model.predict(X_test).ravel()
fpr_keras, tpr_keras, thresholds_keras = roc_curve(y_test, y_pred_keras)

AUC value can also be calculated like this.

from sklearn.metrics import auc
auc_keras = auc(fpr_keras, tpr_keras)

To make the plot looks more meaningful, let's train another binary classifier and compare it with our Keras classifier later in the same plot.

from sklearn.ensemble import RandomForestClassifier
# Supervised transformation based on random forests
rf = RandomForestClassifier(max_depth=3, n_estimators=10)
rf.fit(X_train, y_train)

y_pred_rf = rf.predict_proba(X_test)[:, 1]
fpr_rf, tpr_rf, thresholds_rf = roc_curve(y_test, y_pred_rf)
auc_rf = auc(fpr_rf, tpr_rf)

Now, let's plot the ROC for the two classifiers.

plt.figure(1)
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='Keras (area = {:.3f})'.format(auc_keras))
plt.plot(fpr_rf, tpr_rf, label='RF (area = {:.3f})'.format(auc_rf))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()
# Zoom in view of the upper left corner.
plt.figure(2)
plt.xlim(0, 0.2)
plt.ylim(0.8, 1)
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='Keras (area = {:.3f})'.format(auc_keras))
plt.plot(fpr_rf, tpr_rf, label='RF (area = {:.3f})'.format(auc_rf))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve (zoomed in at top left)')
plt.legend(loc='best')
plt.show()

Here is the result:

roc-binary

As you can see, given the AUC metric, Keras classifier outperforms the other classifier.

ROC, AUC for a categorical classifier

ROC curve extends to problems with three or more classes with what is known as the one-vs-all approach.

For instance, if we have three classes, we will create three ROC curves,

For each class, we take it as the positive class and group the rest classes jointly as the negative class.

  • Class 1 vs classes 2&3
  • Class 2 vs classes 1&3
  • Class 3 vs classes 1&2

Let's started by creating some train/test data with 3 class outputs.

from sklearn.datasets import make_classification
from sklearn.preprocessing import label_binarize
# 3 classes to classify
n_classes = 3

X, y = make_classification(n_samples=80000, n_features=20, n_informative=3, n_redundant=0, n_classes=n_classes,
    n_clusters_per_class=2)
# Binarize the output
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

Then we build and train a categorical Keras classifier like before.

from keras.models import Sequential
from keras.layers import Dense

def build_model():
    model = Sequential()
    model.add(Dense(20, input_dim=20, activation='relu'))
    model.add(Dense(40, activation='relu'))
    model.add(Dense(3, activation='softmax'))
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

keras_model2 = build_model()
keras_model2.fit(X_train, y_train, epochs=10, batch_size=100, verbose=1)

After training the model we can use it to make predictions for test inputs and plot ROC for each of the 3 classes.

Before doing that, let's define the metric to evaluate the overall performance across all classes. There are two slightly different metrics, micro and macro averaging.

In “micro averaging”, we’d calculate the performance, e.g., precision, from the individual true positives, true negatives, false positives, and false negatives of the k-class model:

micro-averaging

And in macro-averaging, we average the performances of each individual class:

marco-averaging

Here is the code to plot those ROC curves along with AUC values.

import numpy as np
from scipy import interp
import matplotlib.pyplot as plt
from itertools import cycle
from sklearn.metrics import roc_curve, auc

# Plot linewidth.
lw = 2

# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])

# Compute macro-average ROC curve and ROC area

# First aggregate all false positive rates
all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))

# Then interpolate all ROC curves at this points
mean_tpr = np.zeros_like(all_fpr)
for i in range(n_classes):
    mean_tpr += interp(all_fpr, fpr[i], tpr[i])

# Finally average it and compute AUC
mean_tpr /= n_classes

fpr["macro"] = all_fpr
tpr["macro"] = mean_tpr
roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])

# Plot all ROC curves
plt.figure(1)
plt.plot(fpr["micro"], tpr["micro"],
         label='micro-average ROC curve (area = {0:0.2f})'
               ''.format(roc_auc["micro"]),
         color='deeppink', linestyle=':', linewidth=4)

plt.plot(fpr["macro"], tpr["macro"],
         label='macro-average ROC curve (area = {0:0.2f})'
               ''.format(roc_auc["macro"]),
         color='navy', linestyle=':', linewidth=4)

colors = cycle(['aqua', 'darkorange', 'cornflowerblue'])
for i, color in zip(range(n_classes), colors):
    plt.plot(fpr[i], tpr[i], color=color, lw=lw,
             label='ROC curve of class {0} (area = {1:0.2f})'
             ''.format(i, roc_auc[i]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Some extension of Receiver operating characteristic to multi-class')
plt.legend(loc="lower right")
plt.show()


# Zoom in view of the upper left corner.
plt.figure(2)
plt.xlim(0, 0.2)
plt.ylim(0.8, 1)
plt.plot(fpr["micro"], tpr["micro"],
         label='micro-average ROC curve (area = {0:0.2f})'
               ''.format(roc_auc["micro"]),
         color='deeppink', linestyle=':', linewidth=4)

plt.plot(fpr["macro"], tpr["macro"],
         label='macro-average ROC curve (area = {0:0.2f})'
               ''.format(roc_auc["macro"]),
         color='navy', linestyle=':', linewidth=4)

colors = cycle(['aqua', 'darkorange', 'cornflowerblue'])
for i, color in zip(range(n_classes), colors):
    plt.plot(fpr[i], tpr[i], color=color, lw=lw,
             label='ROC curve of class {0} (area = {1:0.2f})'
             ''.format(i, roc_auc[i]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Some extension of Receiver operating characteristic to multi-class')
plt.legend(loc="lower right")
plt.show()

Here is the result, the second plot is a zoom-in view of the upper left corner of the graph.

roc-categorical

You can see for each class, their ROC and AUC values are slightly different, that gives us a good indication of how good our model is at classifying individual class.

Summary and Further reading

In this tutorial, we walked through how to evaluate binary and categorical Keras classifiers with ROC curve and AUC value.

The ROC curve visualizes the quality of the ranker or probabilistic model on a test set, without committing to a classification threshold. We also learned how to compute the AUC value to help us access the performance of a classifier.

If you want to know more about ROC, you can read its Wikipedia page, Receiver operating characteristic, it shows you how the curve is plotted by iterating different thresholds.

Also, it is helpful to check out Sklearn's API document on computing ROC to further understand how to use that function.

You can find the source code for this tutorial in my GitHub repo.

Currently unrated

Comments