This article was automatically translated from the original Turkish version.
Deviant value detection is the process of identifying examples that are significantly different from other observations in a data dataset. These values often represent rare events measurement errors fraudulent activities or systematic issues arising during data collection. Accurate detection of outliers is critical for the accuracy and generalizability of machine learning models.
Machine learning algorithms particularly statistical methods and regression analyses can be heavily influenced by outliers. These values can mislead the model’s learning process introduce bias and ultimately degrade model performance. In particular in small datasets outliers can substantially alter the model’s trend.
These methods identify fly points based on the distribution of the data and typically operate under the assumption of normality.
Z-score: Measures how far a data point is from the mean in terms of standard deviations.

IQR (Interquartile Range): Relies on the distance between quartiles.

IQR Method (Source: )

Isolation Forest: A tree based algorithm that identifies outliers by isolating them. Outliers can be isolated with fewer splitting operations compared to normal points.
One-Class SVM (Support Vector Machine): Used especially for outlier detection in high dimensional data. It constructs a model that captures the majority of data points and considers points outside this region as outliers.
Autoencoders: These building based on deep learning attempt to reconstruct the input data. Data points that cannot be reconstructed well (with high error) are classified as outliers.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Marks data points in low density regions as outliers.
K-means: Points that are far from Cluster centroids can be considered outliers.
Removal: Outliers can be removed from the dataset. However this method is risky if a meaningful portion of the data consists of outliers.
Transformation: The impact of extreme values can be reduced using logarithmic transformation or like techniques.
Correction: Outliers can be investigated to determine their causes and errors in the data source can be corrected.
Separate Modeling: A separate model can be built for outliers for example in fraud detection systems.
Outlier detection is one of the fundamental building blocks for the reliability of machine learning applications. These methods must be used carefully to enhance model performance and to identify real anomalies or rare events. Both classical statistical methods and modern machine learning approaches offer broad application possibilities in this context.
Outlier Detection in Machine Learning
Impact of Outliers
Outlier Detection Methods
Statistical Methods
Machine Learning Based Methods
Density and Clustering Methods
Strategies for Handling Outliers
Application Areas