badge icon

This article was automatically translated from the original Turkish version.

Article

Model Training and Testing

In machine learning and artificial intelligence systems, model training and testing is the process by which a data-based system acquires learning capabilities and the accuracy of this learning is evaluated. Model training aims for an algorithm to learn patterns from given labeled data, while model testing seeks to measure the applicability of this learning to new, real-world data. These processes may vary across supervised, unsupervised, and reinforcement learning methods, but they share a similar fundamental structure.

Model Training Process

Model training typically consists of the following stages:

  1. Data Collection: Effective model training depends on a high-quality and representative dataset. Data is usually gathered from sensors, user inputs, databases, or open-source data repositories.
  2. Data Preprocessing: Collected data undergoes operations such as handling missing values, encoding categorical variables, normalization, and dimensionality reduction. This stage directly affects the model’s learning capacity.
  3. Training and Test Data Split: Data is typically divided into 70–80% for training and the remainder for testing (or validation). The purpose of this split is to evaluate the model’s performance on data it has not seen before.
  4. Model Selection and Configuration: An appropriate algorithm is selected based on the problem type—for example, logistic regression, decision trees, artificial neural networks, or support vector machines. The model’s hyperparameters are then configured.
  5. Training: The training data is used to enable the model to learn. During this process, the model updates its weights by learning the relationships between inputs and target outputs. This is generally accomplished using an optimization algorithm such as stochastic gradient descent.
  6. Validation: The model’s tendency toward overfitting is assessed using validation data. Techniques such as early stopping and k-fold cross-validation are commonly applied in this step.

Model Testing and Evaluation

The model testing phase aims to measure the performance of a trained algorithm on a dataset it has never encountered before. The metrics used in this process evaluate how well the model generalizes the knowledge it has learned—that is, its generalization capability. Model performance is assessed through both quantitative and qualitative analysis.

Key Performance Metrics

Accuracy: The ratio of total correct predictions to the total number of samples. It is a meaningful metric for datasets with balanced class distributions. However, it can be misleading in imbalanced datasets. For example, in a dataset where 95% of samples belong to the negative class, if all predictions are negative, accuracy will be 95%, even though the model has not learned anything meaningful.

Precision: Indicates what proportion of samples predicted as positive are truly positive. It is especially important in scenarios where false positives are costly—for example, in spam filters.

<span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="mord mathnormal">rec</span><span class="mord mathnormal">i</span><span class="mord mathnormal">s</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.13889em;">TP</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.13889em;">FP</span><span class="mclose">)</span><span class="mord">/</span><span class="mord mathnormal" style="margin-right:0.13889em;">TP</span><span class="mord">​</span></span></span></span>

Recall (Sensitivity): Indicates what proportion of truly positive samples were correctly predicted by the model. It is critical in scenarios where missing a positive case is costly—for example, in medical diagnosis.

<span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mord mathnormal">ec</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.01968em;">ll</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.13889em;">TP</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">FN</span><span class="mclose">)</span><span class="mord">/</span><span class="mord mathnormal" style="margin-right:0.13889em;">TP</span><span class="mord">​</span></span></span></span>

F1 Score: The harmonic mean of Precision and Recall. It is used to assess balanced model performance. The F1 score ranges from 0 to 1. Scores of 0.8 and above typically represent successful models. Scores between 0.6 and 0.8 are considered acceptable, while scores below 0.6 indicate models that generally require improvement.

<span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mord">1</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">2</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="mord mathnormal">rec</span><span class="mord mathnormal">i</span><span class="mord mathnormal">s</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mord mathnormal">ec</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.01968em;">ll</span><span class="mclose">)</span><span class="mord">/</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="mord mathnormal">rec</span><span class="mord mathnormal">i</span><span class="mord mathnormal">s</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.00773em;">R</span><span class="mord mathnormal">ec</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.01968em;">ll</span><span class="mclose">)</span><span class="mord">​</span></span></span></span>

ROC-AUC Curve (Receiver Operating Characteristic – Area Under Curve): Measures the model’s ability to distinguish between classes. The ROC curve plots the true positive rate (Recall) against the false positive rate. An AUC score of 0.5 indicates random guessing. Models with an AUC greater than 0.8 are generally considered strong.

Loss Function: A function that measures the difference between the model’s predicted values and the true values. It quantifies how “wrong” the model is during training and testing. For example, Mean Squared Error (MSE) is commonly used in regression models, while Binary Cross Entropy is frequently used in binary classification problems.

Overfitting and Underfitting

Overfitting: This occurs when a model learns the training data too well and fails to generalize to test data. In this case, the model exhibits very low error on training data but high error on new data. A typical sign of overfitting is low training loss but high validation or test loss.

Solutions: Collect more data

    Underfitting: This occurs when the model fails to capture the underlying pattern in both training and test data. It is usually due to insufficient model complexity or inadequate training time.

    Solutions

      Model training and testing are fundamental processes that determine the accuracy and reliability of artificial intelligence projects. The quality of training, data integrity, and the correctness of testing protocols directly affect the success of the application. Therefore, the model development process must proceed iteratively and be continuously evaluated from both technical and ethical perspectives.

      Author Information

      Avatar
      AuthorHüsnü Umut OkurDecember 3, 2025 at 9:33 AM

      Discussions

      No Discussion Added Yet

      Start discussion for "Model Training and Testing" article

      View Discussions

      Contents

      • Model Training Process

      • Model Testing and Evaluation

        • Key Performance Metrics

        • Overfitting and Underfitting

          • Solutions

      Ask to Küre