How to Create and Calculate a Confusion Matrix in R [With Examples]

shashidhar123

606
views

How to Create and Calculate a Confusion Matrix in R [With Examples]

Today, R is among the most used tools for statistical and data analysis. Because of its superior computational, visual, and graphical capabilities, this open-source environment is widely adopted. Working knowledge of the R programming language can be very useful for engineering students, corporate analytics professionals, and anybody else interested in data science.

This post will specifically aim at helping you understand how to make and calculate a confusion matrix in R. We'll also provide some examples to help you better understand how this metric works. First, let's kick off with an introduction to the confusion matrix.

What is a Confusion Matrix?

A confusion matrix is a table that is used to evaluate the performance of a machine-learning model. The table is made up of four different types of predictions:

True Positives (TP): These are the cases where the model predicted the positive class and the actual class was also positive.

True Negatives (TN): These are the cases where the model predicted the negative class and the actual class was also negative.

False Positives (FP): These are the cases where the model predicted the positive class but the actual class was negative.

False Negatives (FN): These are the cases where the model predicted the negative class but the actual class was positive.

The confusion matrix can be used to calculate several different metrics, such as accuracy, precision, recall, and specificity.

How to Make a Confusion Matrix in R

A confusion matrix is a table that is used to evaluate the performance of a machine-learning model. The matrix is composed of four quadrants: true positives, false positives, false negatives, and true negatives. Each row represents the predicted values for each class, while each column represents the actual values for each class.

The main diagonal of the matrix contains the correct predictions (true positives and true negatives), while the off-diagonal elements contain the incorrect predictions (false positives and false negatives). To calculate the accuracy of the model, we can simply take the ratio of correct predictions to total predictions.

In R, we can create a confusion matrix using thetable() function. The first argument is used to specify the actual values, while the second argument is used to specify the predicted values. For example, if we have a vector of actual values y and a vector of predicted values p, we can create a confusion matrix as follows:

table(y,p)

y 0 1

0 TN FP

1 FN TP

How to Calculate a Confusion Matrix in R

A confusion matrix is a table that is used to evaluate the performance of a machine-learning model. The table shows the predicted class for each data point in the test set, compared to the actual class of the data point.

To calculate a confusion matrix in R, you can use the caret package. This package contains a function called confusionMatrix(), which takes two arguments: the first is the actual class labels, and the second is the predicted class labels.

So, for our example, if we have 10 data points in our test set, and our model predicts 8 of them correctly, then we would have a “true positive” rate of 80%. Similarly, if our model predicts 2 data points as belonging to Class A when they belong to Class B, then we would have a “false positive” rate of 20%.

Here’s how you would calculate these values using the caret package:

library(caret)

confusionMatrix(test_labels, predicted_labels)

Examples of Confusion Matrices

There are many ways to create a confusion matrix, but in general, the goal is to show how well your classification model is performing. This can be done by showing the number of correct and incorrect predictions for each class.

A confusion matrix can also be used to calculate a variety of statistics, such as accuracy, precision, recall, and specificity. In this blog post, we'll focus on how to make a confusion matrix in R and how to calculate some of these statistics.

To start, let's take a look at a simple example. Say we have a classification model that predicts whether or not someone will default on their loan. Our model has two classes: 0 for non-default and 1 for default. After running our model on some data, we get the following results:

True class Predicted class

0 0

1 1 --> This is a truly positive

0 0

1 0 --> This is a false negative

1 1 --> This is a truly positive

0 0

0 1 --> This is a false positive

TN | FP

FN|TP

Our confusion matrix would then look like this:

Predicted class

True class 0 1 TOTAL

0 3 1 4 Non-Default (N) 85% 25% 60% - Recall = TP / (TP+FN) = 2/3 = 66.7% row wise sum=

Conclusion

A confusion matrix is a powerful tool for understanding the performance of your machine-learning models. In this article, we've walked through how to create and interpret a confusion matrix using R. We hope that this has been helpful in better understanding how your models are performing and where you might need to make improvements.

If you are a data science Aspirant but are looking for a support system to help you build a long-term path in this domain, Skillslash’s Data Structures and Algorithms course and learn in-depth about the topic and become a successful professional in this field. Skillslash also offers Data Science Course In Hyderabad with a placement guarantee, Skillslash can help you get into it with its Full Stack Developer course in Hyderabad. Get in touch with the support team to know more.