This article was automatically translated from the original Turkish version.
The loss function is one of the fundamental tools used in machine learning and statistical modeling to measure a model’s predictive performance. It converts the difference between the predicted value and the true value into a numerical measure that indicates how accurate or inaccurate the model’s predictions are. Loss functions not only quantify the error rate but also provide information on how the model should be optimized. They play a critical role in training models in fields such as deep learning supervised learning and reinforcement learning.
A loss function is typically represented as:
L(y, ŷ)
Where:
The output of the function is usually a positive real number. The smaller this number the closer the model’s predictions are to the true values. During the training process the value of the loss function is minimized through the continuous updating of the model’s parameters.
The loss function lies at the center of optimization problems in machine learning. During training the model parameters are adjusted to minimize the loss function. The most common methods used in this process are gradient descent and its variants such as SGD Adam and RMSprop. It is important for the loss function to be differentiable for the optimization process. Therefore in complex models specially designed differentiable loss functions are preferred.
The mean squared error is calculated by taking the square of the difference between the true and predicted values and then computing the average. Larger errors are penalized more heavily. It is sensitive to outliers.
This is the average of the absolute values of prediction errors. It is tolerant to outliers but may present some challenges for optimization algorithms.
RMSE is obtained by taking the square root of MSE. Since its unit of error matches that of the target variable it is easier to interpret.
It strikes a balance between MAE and MSE. For small errors it uses squared differences and for large errors it switches to absolute differences. It is both robust to outliers and differentiable.
Used in binary classification. The loss is computed based on the model’s probability estimate for the correct class. Outputs are constrained to the range [0 1] using the sigmoid function.
Used in multiclass classification problems. Logarithmic losses are computed for the correct class based on outputs normalized by the softmax function.
Used in support vector machines. It enforces the model’s prediction for the correct class to exceed a certain margin. Predictions falling below this margin are penalized.
Although these two terms are often used interchangeably they are technically distinct.
Definition and Importance
Relationship with Optimization
Loss Functions for Regression Problems
Mean Squared Error (MSE)
Mean Absolute Error (MAE)
Root Mean Squared Error (RMSE)
Huber Loss
Loss Functions for Classification Problems
Binary Cross-Entropy Loss
Categorical Cross-Entropy Loss
Hinge Loss
Difference Between Loss and Cost Function
Applications