badge icon

This article was automatically translated from the original Turkish version.

Article

Confusion Matrix

Psychology

+1 More

Machine learning models are increasingly used in various applications to classify data into different categories. However, evaluating their performance is essential to ensure their accuracy and reliability. One of the key tools in this evaluation process is the confusion matrix.

The confusion matrix, is a simple table that shows how well a classification model performs by comparing its predictions with real results. It categorizes predictions into four groups: correct predictions for both classes (true positives and true negatives) and wrong predictions (false positives and false negatives).


Matrix indicates the total number of samples generated by the model on the test data.


  • True Positive (TP): The model correctly predicts a positive outcome. (Actual outcome is positive).
  • True Negative (TN): The model correctly predicts a negative outcome. (Actual outcome is negative).
  • False Positive (FP): The model incorrectly predicts a positive outcome (Actual outcome is negative). Also known as Type I error.
  • False Negative (FN): The model incorrectly predicts a negative outcome (Actual outcome is positive). Also known as Type II error.


A confusion matrix helps you visualize how well a model performs by showing correct and incorrect predictions. It also enables the calculation of fundamental metrics such as precision and recall such as that provide a better idea of performance, especially when data is imbalanced.


Metrics Based on Confusion Matrix Data

1- Accuracy

Accuracy measures how often the model's predictions are correct overall. It provides a general sense of model performance. However, accuracy can be misleading, especially in imbalanced data datasets where one class dominates. For example, a model that correctly predicts most instances of the majority class may achieve high accuracy but still fail to capture details about other classes.


2- Precision

Precision focuses on the quality of the model’s positive predictions. It indicates what proportion of examples predicted as positive are actually positive. Precision is important in scenarios where false positives must be minimized, such as detecting spam emails or fraud.


3- Recall

Recall measures how well the model identifies all actual positive cases. It shows the ratio of true positives detected among all actual positive examples. High recall is critical when missing positive cases has serious consequences, such as in medical diagnosis.


4- F1 Score

F1 score combines precision and recall into a single metric to balance them. It provides a better understanding of a model’s overall performance, especially for imbalanced datasets. The F1 score is useful when both false positives and false negatives are important, but it assumes that precision and recall are equally important, which may not always align with the use case.


5- Specificity

Specificity is another important metric for evaluating classification models, especially in binary classification. It measures a model’s ability to correctly identify negative examples. Specificity is also known as the True Negative Rate. The formula is given as:


6- Type I and Type II Errors

  • Type I Error: A Type I Error occurs when the model incorrectly predicts a positive example, while the actual example is negative. This is also known as a false positive. Type I Errors affect the model’s precision, which measures the accuracy of positive predictions.
  • Type II Error: A Type II Error occurs when the model fails to predict a positive example, even though it is actually positive. This is also known as a false negative. Type II Errors affect the model’s recall, which measures how well it identifies all actual positive cases.


Confusion Matrix for Binary Classification

Below is a 2x2 confusion matrix for image recognition, classifying images as either Dog or not a dog:




  • True Positive (TP): Total count of images correctly predicted and actually labeled as Dog.
  • True Negative (TN): Total count of images correctly predicted and actually labeled as Not Dog.
  • False Positive (FP): Total count of images predicted as Dog but actually Not Dog.
  • False Negative (FN): Total count of images predicted as Not Dog but actually Dog.


Example: Confusion Matrix for Dog Image Recognition with Numbers



  • Actual Dog Count = 6
  • Actual Not Dog Count = 4
  • True Positive Count = 5
  • False Positive Count = 1
  • True Negative Count = 3
  • False Negative Count = 1



Implementation of Confusion Matrix for Binary Classification Using Python

1. Step: Import required libraries.


2. Step: Create NumPy arrays for actual and predicted labels.


3. Step: Calculate the confusion matrix.


4. Step: Plot the confusion matrix using a Seaborn heatmap.


Output:



5. Step: Classification Report Based on Confusion Matrix Metrics


Output:



Author Information

Avatar
AuthorBeyza Nur TürküDecember 25, 2025 at 8:22 AM

Discussions

No Discussion Added Yet

Start discussion for "Confusion Matrix" article

View Discussions

Contents

  • Metrics Based on Confusion Matrix Data

    • 1- Accuracy

    • 2- Precision

    • 3- Recall

    • 4- F1 Score

    • 5- Specificity

    • 6- Type I and Type II Errors

  • Confusion Matrix for Binary Classification

  • Example: Confusion Matrix for Dog Image Recognition with Numbers

  • Implementation of Confusion Matrix for Binary Classification Using Python

    • 1. Step: Import required libraries.

    • 2. Step: Create NumPy arrays for actual and predicted labels.

    • 3. Step: Calculate the confusion matrix.

    • 4. Step: Plot the confusion matrix using a Seaborn heatmap.

    • Output:

    • 5. Step: Classification Report Based on Confusion Matrix Metrics

    • Output:

Ask to Küre