badge icon

This article was automatically translated from the original Turkish version.

Article

Decision trees are a machine learning method used to solve classification and regression problems by splitting data into branches. They form the foundation of Tree based learning models, and advanced models such as Random Forest and XGBoost such as are based on these principles.

Basic Structure of Decision Trees

Decision trees classify data or make predictions by dividing data according to specific rules using a hierarchical structure. The tree structure consists of the following fundamental components:

Root Node

  • It is the topmost node of the tree.
  • The variable that best splits the data is selected here.

Internal Nodes

  • These are nodes that divide the data into two or more subgroups based on specific criteria.
  • Each internal node splits the data using a specific splitting criterion.

Leaf Nodes

  • These are terminal nodes that cannot be split further and contain the final prediction or classification.
  • In regression problems they contain a predicted value; in classification problems they contain a category.

Branching

  • It refers to the connections from the root node to the leaf nodes.

Working Principle of Decision Trees

Decision trees operate using the divide and conquer principle. The tree construction process consists of the following steps:

Determining the Best Splitting Criterion

  • Decision trees attempt to identify the best feature and the optimal threshold value for splitting the data.
  • Metrics such as information gain, Gini impurity, or variance reduction are used in this stage.

Splitting Criteria for Classification Problems

Gini Impurity:

  • It measures how pure or homogeneous the data at a node is.
  • A low Gini value indicates that most samples at the node belong to the same class.
  • Gini Calculation Formula:
  • Here, pi is the proportion of samples belonging to class i.

Information Gain – Entropy:

  • The quality of a split is evaluated by measuring the information uncertainty (entropy) at the node.
  • Entropy Calculation Formula:



Information gain is calculated as the difference between entropy before and after the split:


• Splits with lower entropy values are preferred.

Splitting Criterion for Regression Problems

Variance Reduction:

    Overfitting in Decision Trees and Solutions

    Overfitting is a common problem for decision tree models and many other predictive models. It occurs when the learning algorithm continues to reduce errors on the training set at the expense of generalization. To avoid overfitting during decision tree construction, two general approaches are used:

    • Pre-pruning: Stopping the growth of the tree before it becomes too complex.
    • Post-pruning: First building the complete tree and then removing unnecessary parts.


    In practice, the first approach is rarely used due to the difficulty in determining when to stop pruning. The second approach is significantly more successful. The following considerations must be observed in this approach:

     

    • Use a different dataset from the training data to decide on pruning. This dataset is called the validation dataset and is used to determine which nodes are unnecessary.
    • After constructing a decision tree, statistical methods such as error estimation and significance testing (Chi-Square Testing) are used to decide whether to prune or expand the tree by adding new nodes.
    • Minimum Description Length principle: This is a measure between the decision tree and the training dataset. Tree growth is stopped when the sum of tree size and the size of unclassified data is minimized.

    Advantages and Disadvantages

    Advantages

    Easy to Understand and Interpret

      Low Data Preprocessing Requirements

        Feature Selection and Importance Ranking

          Fast and Low Computational Cost

            Generate Rule-Based Decisions

              Disadvantages

              Risk of Overfitting

                Sensitivity to Noise and Small Data Variations

                  Poor Performance on Imbalanced Datasets

                    Inefficient for Large Datasets

                      Instability with Continuous and Discrete Variables

                        Author Information

                        Avatar
                        AuthorKübra MerkDecember 19, 2025 at 6:53 AM

                        Tags

                        Discussions

                        No Discussion Added Yet

                        Start discussion for "Decision Trees" article

                        View Discussions

                        Contents

                        • Basic Structure of Decision Trees

                          • Root Node

                          • Internal Nodes

                          • Leaf Nodes

                          • Branching

                        • Working Principle of Decision Trees

                          • Determining the Best Splitting Criterion

                            • Splitting Criteria for Classification Problems

                          • Information Gain – Entropy:

                            • Splitting Criterion for Regression Problems

                        • Overfitting in Decision Trees and Solutions

                        • Advantages and Disadvantages

                          • Advantages

                          • Disadvantages

                        Ask to Küre