This article was automatically translated from the original Turkish version.
Fine-tuning in the field of Artificial intelligence is the process of adapting a pre-trained model to a specific task or dataset. This method enables the model to respond more effectively to new data while preserving its existing knowledge. It is widely used in particular in large language models (LLM) and deep learning systems.
Fine-tuning is a powerful technique that helps models extract additional information from data. To understand fine-tuning, one must first understand transfer learning. Transfer learning is a machine learning technique in which a model trained on one task is repurposed for a second related task. During this process, some layers of the trained model are frozen and kept fixed while other layers are retrained using new data. This ensures that the model retains its general knowledge while developing task-specific capabilities. Fine-tuning is applied when the available data is not large enough to train a model from scratch and only relatively small datasets are available.

Depiction of using artificial intelligence for different tasks (Freepik)
In large language models, fine-tuning is the process of retraining a model that has been previously trained on extensive data, specifically for a particular task or dataset. This approach preserves the model’s general linguistic capabilities while achieving higher accuracy and contextual relevance in the target domain. For example, a language model fine-tuned on customer support texts can provide more appropriate and accurate responses to technical support queries. Typically, only a portion of the model’s parameters are updated during this process, thereby conserving computational resources and minimizing the risk of the model “forgetting” its prior knowledge.
Fine-tuning allows you to specialize a pre-trained large language model for a specific task without affecting all of its parameters. Compared to prompt engineering, fine-tuning offers several additional advantages:

Fine-tuning schematic Prottasha et al. (2024)
One of the most critical steps in the fine-tuning process for large language models is data set preparation. This stage directly affects the model’s ability to perform effectively on the target task. The data set should be high quality, aligned with the target task, and as balanced as possible.
Before constructing the data set, the task for which the model will be trained must be clearly defined. For example, if the goal is to develop a chat assistant, dialogue data is required; for sentiment analysis, labeled text data is needed. Data is collected from appropriate sources based on the task definition. These sources may include publicly available datasets, internal company documents, user interactions, or human-generated specialized texts.
The collected data undergoes a standard preprocessing pipeline for natural language processing. During this process, spelling errors are corrected, unnecessary symbols are removed, and inconsistencies are resolved. If the model is to be trained on structured data such as question-answer pairs, the text is converted into the appropriate format.
For supervised fine-tuning, the data must be properly labeled. This labeling process can be performed manually or, in some cases, using automated rules. For example, in a classification task, each text must be assigned the correct class label. Additionally, the data must be formatted to match the model’s input structure, such as a “prompt-response” format.
To enable the model to learn evenly, attention must be paid to class balance within the data set. Otherwise, the model may become biased toward classes with more examples. Low-quality, contradictory, or semantically incorrect data must be removed. In some cases, working with a small but high-quality data set can be more efficient than using a large but disorganized one.
When preparing the data set, any data containing personal or sensitive information must be anonymized. This is an essential step both ethically and legally. Data protection regulations such as GDPR and KVKK must be taken into account.
Fine-Tuning Technique
Fine-Tuning in Large Language Models
Types of Fine-Tuning for Large Language Models
Data Set Preparation
Task Definition and Data Selection
Cleaning and Preprocessing
Labeling and Task Formatting
Data Quality and Balance
Anonymization and Privacy