This article was automatically translated from the original Turkish version.
Decision trees are a machine learning method used to solve classification and regression problems by splitting data into branches. They form the foundation of Tree based learning models, and advanced models such as Random Forest and XGBoost such as are based on these principles.
Decision trees classify data or make predictions by dividing data according to specific rules using a hierarchical structure. The tree structure consists of the following fundamental components:
Decision trees operate using the divide and conquer principle. The tree construction process consists of the following steps:
Gini Impurity:


• Information gain is calculated as the difference between entropy before and after the split:

• Splits with lower entropy values are preferred.
Variance Reduction:
Overfitting is a common problem for decision tree models and many other predictive models. It occurs when the learning algorithm continues to reduce errors on the training set at the expense of generalization. To avoid overfitting during decision tree construction, two general approaches are used:
In practice, the first approach is rarely used due to the difficulty in determining when to stop pruning. The second approach is significantly more successful. The following considerations must be observed in this approach:
Easy to Understand and Interpret
Low Data Preprocessing Requirements
Feature Selection and Importance Ranking
Fast and Low Computational Cost
Generate Rule-Based Decisions
Risk of Overfitting
Sensitivity to Noise and Small Data Variations
Poor Performance on Imbalanced Datasets
Inefficient for Large Datasets
Instability with Continuous and Discrete Variables
Basic Structure of Decision Trees
Root Node
Internal Nodes
Leaf Nodes
Branching
Working Principle of Decision Trees
Determining the Best Splitting Criterion
Splitting Criteria for Classification Problems
Information Gain – Entropy:
Splitting Criterion for Regression Problems
Overfitting in Decision Trees and Solutions
Advantages and Disadvantages
Advantages
Disadvantages