badge icon

This article was automatically translated from the original Turkish version.

Article

Detection Transformer (DETR) is an innovative artificial intelligence model developed by Facebook AI in 2020 that adopts an end-to-end learning approach for object detection tasks. Unlike traditional object detection methods, DETR is the first model to center its architecture around the Transformer framework to predict both the locations and classes of objects in an image.


DETR sample schematic (Medium)

Background

Traditional object detection systems typically involve a multi-stage processing pipeline. These stages include feature extraction, region proposal, classification, and bounding box refinement. Such systems commonly rely on CNN (Convolutional Neural Network) architectures and require specialized processing steps like Non-Maximum Suppression (NMS).


DETR simplifies this classical workflow by providing an end-to-end solution using only a CNN and a Transformer architecture. This eliminates the need for independent stages and handcrafted rules.

Architectural Components

The DETR architecture consists of three key components:


CNN-based feature extraction: Typically a convolutional neural network such as ResNet generates low-dimensional yet meaningful feature maps from the input image.

Transformer encoder-decoder structure: The Transformer module processes these feature maps. The encoder transforms the input into meaningful vector representations, while the decoder uses these representations through “object queries” to perform object detection.

FFN (Feed-Forward Network): The decoder’s outputs are converted into class labels and bounding box predictions for each detected object.


Sample schematic (Medium)

Transformer Architecture

The Transformer architecture at the core of DETR is inspired by the 2017 paper “Attention is All You Need” by Vaswani et al. Transformers operate using self-attention, multi-head attention, and feed-forward network layers. DETR is among the first successful applications of this structure to image-based tasks.

Working Principle

DETR generates a fixed number of learned “object queries,” each representing a potential object in the image. The model performs all object predictions in parallel over these queries. Class and box assignments are made using the Hungarian algorithm, ensuring a matching strategy that eliminates redundant or overlapping predictions.

Hungarian algorithm (Medium)

As a result, classical filtering operations such as Non-Maximum Suppression (NMS) are no longer required. Each prediction is directly assigned to an object or to the “no object” class.

Loss Function

DETR employs a loss function composed of two main components: class prediction and bounding box localization. The total loss is minimized by matching the model’s predictions with ground truth objects. Special adjustments are made for the “no object” class to mitigate class imbalance during loss computation.

Advantages and Innovations

End-to-end learning: DETR simplifies the entire object detection pipeline by performing all steps within a single model.

Generalization capability: It can be easily adapted to different datasets without requiring complex hand-tuned operations.

Transformer advantages: It enables learning long-range dependencies and supports parallel processing.

Eliminates NMS requirement: Direct matching of predictions to objects removes the need for post-processing filtering.

Limitations

DETR can learn more slowly than classical approaches when detecting small objects. It requires long training times and large datasets. Additionally, latency issues may limit its use in real-time applications.


DETR is a pioneering work that transformed the paradigm of object detection by successfully applying the Transformer architecture. By overcoming the limitations of traditional methods, it offers a simpler and more general solution. It is widely regarded as a foundational milestone that paved the way for the next generation of detectors in computer vision and artificial intelligence.

Author Information

Avatar
AuthorBeyza Nur AciyanDecember 9, 2025 at 8:20 AM

Tags

Discussions

No Discussion Added Yet

Start discussion for "DETR" article

View Discussions

Contents

  • Background

  • Architectural Components

  • Transformer Architecture

  • Working Principle

  • Loss Function

  • Advantages and Innovations

  • Limitations

Ask to Küre