This article was automatically translated from the original Turkish version.

Decision Trees

Quote

Decision trees are a machine learning method used to solve classification and regression problems by splitting data into branches. They form the foundation of Tree based learning models, and advanced models such as Random Forest and XGBoost such as are based on these principles.
Basic Structure of Decision Trees
Decision trees classify data or make predictions by dividing data according to specific rules using a hierarchical structure. The tree structure consists of the following fundamental components:
Root Node
It is the topmost node of the tree.
The variable that best splits the data is selected here.
Internal Nodes
These are nodes that divide the data into two or more subgroups based on specific criteria.
Each internal node splits the data using a specific splitting criterion.
Leaf Nodes
These are terminal nodes that cannot be split further and contain the final prediction or classification.
In regression problems they contain a predicted value; in classification problems they contain a category.
Branching
It refers to the connections from the root node to the leaf nodes.
Working Principle of Decision Trees
Decision trees operate using the divide and conquer principle. The tree construction process consists of the following steps:
Determining the Best Splitting Criterion
Decision trees attempt to identify the best feature and the optimal threshold value for splitting the data.
Metrics such as information gain, Gini impurity, or variance reduction are used in this stage.
Splitting Criteria for Classification Problems
Gini Impurity:
It measures how pure or homogeneous the data at a node is.
A low Gini value indicates that most samples at the node belong to the same class.
Gini Calculation Formula:
Here, pi is the proportion of samples belonging to class i.
Information Gain – Entropy:
The quality of a split is evaluated by measuring the information uncertainty (entropy) at the node.
Entropy Calculation Formula:


         • Information gain is calculated as the difference between entropy before and after the split:

         • Splits with lower entropy values are preferred.
Splitting Criterion for Regression Problems
Variance Reduction:
Overfitting in Decision Trees and Solutions
Overfitting is a common problem for decision tree models and many other predictive models. It occurs when the learning algorithm continues to reduce errors on the training set at the expense of generalization. To avoid overfitting during decision tree construction, two general approaches are used:
Pre-pruning: Stopping the growth of the tree before it becomes too complex.
Post-pruning: First building the complete tree and then removing unnecessary parts.

In practice, the first approach is rarely used due to the difficulty in determining when to stop pruning. The second approach is significantly more successful. The following considerations must be observed in this approach:
 
Use a different dataset from the training data to decide on pruning. This dataset is called the validation dataset and is used to determine which nodes are unnecessary.
After constructing a decision tree, statistical methods such as error estimation and significance testing (Chi-Square Testing) are used to decide whether to prune or expand the tree by adding new nodes.
Minimum Description Length principle: This is a measure between the decision tree and the training dataset. Tree growth is stopped when the sum of tree size and the size of unclassified data is minimized.
Advantages and Disadvantages
Advantages
Easy to Understand and Interpret
Low Data Preprocessing Requirements
Feature Selection and Importance Ranking
Fast and Low Computational Cost
Generate Rule-Based Decisions
Disadvantages
Risk of Overfitting
Sensitivity to Noise and Small Data Variations
Poor Performance on Imbalanced Datasets
Inefficient for Large Datasets
Instability with Continuous and Discrete Variables

Bibliographies

Salman, H. A., Kalakech, A., & Steiti, A. (2024). Random forest algorithm overview. Babylonian Journal of Machine Learning, 2024, 69-79.

Singh, A., Thakur, N., & Sharma, A. (2016, March). A review of supervised machine learning algorithms. In 2016 3rd international conference on computing for sustainable global development (INDIACom) (pp. 1310-1315). Ieee.

Author Information

AuthorKübra MerkDecember 19, 2025 at 6:53 AM

Discussions

No Discussion Added Yet

Start discussion for "Decision Trees" article

View Discussions

Basic Structure of Decision Trees
- Root Node
- Internal Nodes
- Leaf Nodes
- Branching
Working Principle of Decision Trees
- Determining the Best Splitting Criterion
  - Splitting Criteria for Classification Problems
- Information Gain – Entropy:
  - Splitting Criterion for Regression Problems
Overfitting in Decision Trees and Solutions
Advantages and Disadvantages
- Advantages
- Disadvantages

Decision Trees

Basic Structure of Decision Trees

Root Node

Internal Nodes

Leaf Nodes

Branching

Working Principle of Decision Trees

Determining the Best Splitting Criterion

Splitting Criteria for Classification Problems

Information Gain – Entropy:

Splitting Criterion for Regression Problems

Overfitting in Decision Trees and Solutions

Advantages and Disadvantages

Advantages

Disadvantages

Bibliographies

Author Information

Tags

Discussions

Contents