This article was automatically translated from the original Turkish version.

DETR

+2 More

Quote

Detection Transformer (DETR) is an innovative artificial intelligence model developed by Facebook AI in 2020 that adopts an end-to-end learning approach for object detection tasks. Unlike traditional object detection methods, DETR is the first model to center its architecture around the Transformer framework to predict both the locations and classes of objects in an image.

DETR sample schematic (Medium)
Background
Traditional object detection systems typically involve a multi-stage processing pipeline. These stages include feature extraction, region proposal, classification, and bounding box refinement. Such systems commonly rely on CNN (Convolutional Neural Network) architectures and require specialized processing steps like Non-Maximum Suppression (NMS).

DETR simplifies this classical workflow by providing an end-to-end solution using only a CNN and a Transformer architecture. This eliminates the need for independent stages and handcrafted rules.
Architectural Components
The DETR architecture consists of three key components:

CNN-based feature extraction: Typically a convolutional neural network such as ResNet generates low-dimensional yet meaningful feature maps from the input image.
Transformer encoder-decoder structure: The Transformer module processes these feature maps. The encoder transforms the input into meaningful vector representations, while the decoder uses these representations through “object queries” to perform object detection.
FFN (Feed-Forward Network): The decoder’s outputs are converted into class labels and bounding box predictions for each detected object.

Sample schematic (Medium)
Transformer Architecture
The Transformer architecture at the core of DETR is inspired by the 2017 paper “Attention is All You Need” by Vaswani et al. Transformers operate using self-attention, multi-head attention, and feed-forward network layers. DETR is among the first successful applications of this structure to image-based tasks.
Working Principle
DETR generates a fixed number of learned “object queries,” each representing a potential object in the image. The model performs all object predictions in parallel over these queries. Class and box assignments are made using the Hungarian algorithm, ensuring a matching strategy that eliminates redundant or overlapping predictions.
Hungarian algorithm (Medium)
As a result, classical filtering operations such as Non-Maximum Suppression (NMS) are no longer required. Each prediction is directly assigned to an object or to the “no object” class.
Loss Function
DETR employs a loss function composed of two main components: class prediction and bounding box localization. The total loss is minimized by matching the model’s predictions with ground truth objects. Special adjustments are made for the “no object” class to mitigate class imbalance during loss computation.
Advantages and Innovations
End-to-end learning: DETR simplifies the entire object detection pipeline by performing all steps within a single model.
Generalization capability: It can be easily adapted to different datasets without requiring complex hand-tuned operations.
Transformer advantages: It enables learning long-range dependencies and supports parallel processing.
Eliminates NMS requirement: Direct matching of predictions to objects removes the need for post-processing filtering.
Limitations
DETR can learn more slowly than classical approaches when detecting small objects. It requires long training times and large datasets. Additionally, latency issues may limit its use in real-time applications.

DETR is a pioneering work that transformed the paradigm of object detection by successfully applying the Transformer architecture. By overcoming the limitations of traditional methods, it offers a simpler and more general solution. It is widely regarded as a foundational milestone that paved the way for the next generation of detectors in computer vision and artificial intelligence.

Bibliographies

Edgün, Büşra. “Yapay Zeka ile Nesne Tespitinde Yeni Bir Dönem: DETR’ın Yükselişi.” Medium, March 14, 2024. Accessed April 21, 2025. https://medium.com/@busra.edgn/yapay-zeka-ile-nesne-tespitinde-yeni-bir-d%C3%B6nem-detr%C4%B1n-y%C3%BCkseli%C5%9Fi-ac7476d7dc84.

Facebook AI Research. *End-to-End Object Detection with Transformers (DETR)*. GitHub repository. Accessed April 21, 2025. https://github.com/facebookresearch/detr.

Vision Wizard. “DETR.” Medium, August 3, 2020. Accessed April 21, 2025. https://medium.com/visionwizard/detr-b677c7016a47.

Author Information

AuthorBeyza Nur AciyanDecember 9, 2025 at 8:20 AM

Discussions

No Discussion Added Yet

Start discussion for "DETR" article

View Discussions

Background
Architectural Components
Transformer Architecture
Working Principle
Loss Function
Advantages and Innovations
Limitations

DETR

Background

Architectural Components

Transformer Architecture

Working Principle

Loss Function

Advantages and Innovations

Limitations

Bibliographies

Author Information

Tags

Discussions

Contents