Guru's Verification engine ensures consistency, confidence, and trust in the knowledge your organization shares. Learn more.

Machine Learning | Confusion Matrix

When creating classification models in predictive analysis, an important part of evaluating how well a model performs is through the creation of a confusion matrix [1]. This n-by-n-dimensional matrix (where n is the number of classification labels) is a cross-tabulation breakdown of the observations in a test data set by both the predicted label of each observation, and the actual label of the observation. From the information in this table, a number of evaluations of the model's predictive ability can be derived.

For example, for a spam filter that determines whether each of the 500 test email it given is either spam or not, the breakdown might be as follows:

Predicted Not Spam

Predicted Is Spam

Actually Not Spam

150

75

Actually Is Spam

5

270

One conclusion that one can notice from the example confusion matrix is that the spam filter accurately detects about 98% of the spam (270/275), it also mislabels about 33% (75/150) of the non-spam emails as spam. This probably means that it's too aggressive in its spam filtering—most users would likely want less than 1/3 of their non-spam emails being labelled as spam, even at the cost of some more additional spam drifting in.

Creating a Confusion Matrix:

Technically speaking, a confusion matrix is simply the cross-tabulation of the categorical values of the predicted vs. actual values.

table(actual, predicted)   #      0   1#  0 150  75#  1   5 270

Resources:

You must have Author or Collection Owner permission to create Guru Cards. Contact your team's Guru admins to use this template.