This article was automatically translated from the original Turkish version.
+1 More
Machine learning models are increasingly used in various applications to classify data into different categories. However, evaluating their performance is essential to ensure their accuracy and reliability. One of the key tools in this evaluation process is the confusion matrix.
The confusion matrix, is a simple table that shows how well a classification model performs by comparing its predictions with real results. It categorizes predictions into four groups: correct predictions for both classes (true positives and true negatives) and wrong predictions (false positives and false negatives).
Matrix indicates the total number of samples generated by the model on the test data.
A confusion matrix helps you visualize how well a model performs by showing correct and incorrect predictions. It also enables the calculation of fundamental metrics such as precision and recall such as that provide a better idea of performance, especially when data is imbalanced.
Accuracy measures how often the model's predictions are correct overall. It provides a general sense of model performance. However, accuracy can be misleading, especially in imbalanced data datasets where one class dominates. For example, a model that correctly predicts most instances of the majority class may achieve high accuracy but still fail to capture details about other classes.
Precision focuses on the quality of the model’s positive predictions. It indicates what proportion of examples predicted as positive are actually positive. Precision is important in scenarios where false positives must be minimized, such as detecting spam emails or fraud.
Recall measures how well the model identifies all actual positive cases. It shows the ratio of true positives detected among all actual positive examples. High recall is critical when missing positive cases has serious consequences, such as in medical diagnosis.
F1 score combines precision and recall into a single metric to balance them. It provides a better understanding of a model’s overall performance, especially for imbalanced datasets. The F1 score is useful when both false positives and false negatives are important, but it assumes that precision and recall are equally important, which may not always align with the use case.
Specificity is another important metric for evaluating classification models, especially in binary classification. It measures a model’s ability to correctly identify negative examples. Specificity is also known as the True Negative Rate. The formula is given as:
Below is a 2x2 confusion matrix for image recognition, classifying images as either Dog or not a dog:
Metrics Based on Confusion Matrix Data
1- Accuracy
2- Precision
3- Recall
4- F1 Score
5- Specificity
6- Type I and Type II Errors
Confusion Matrix for Binary Classification
Example: Confusion Matrix for Dog Image Recognition with Numbers
Implementation of Confusion Matrix for Binary Classification Using Python
1. Step: Import required libraries.
2. Step: Create NumPy arrays for actual and predicted labels.
3. Step: Calculate the confusion matrix.
4. Step: Plot the confusion matrix using a Seaborn heatmap.
Output:
5. Step: Classification Report Based on Confusion Matrix Metrics
Output: