This article was automatically translated from the original Turkish version.
Autoencoders are a machine learning method based on artificial neural networks that learn by compressing input data into a lower-dimensional, meaningful representation and reconstructing the input from this representation. First introduced in the 1980s, autoencoders operate within the framework of unsupervised learning, with the primary goal of learning important features within the data to construct a low-dimensional representation. In this architecture, where input and output data are identical, the training process aims to minimize data loss. Autoencoders are used in numerous applications including data compression, feature extraction, dimensionality reduction, denoising, and generative modeling.
Autoencoders consist fundamentally of two main components: an encoder and a decoder. The encoder transforms the input into a low-dimensional code, while the decoder attempts to reconstruct the original data from this code.
The encoder transforms high-dimensional input data into a lower-dimensional hidden layer representation. The objective in this process is to represent the data without losing its most important features. This low-dimensional representation produced by the encoder is called the "code" or "latent representation."
The decoder takes this code and generates an output that approximates the original input data. Autoencoders are optimized so that the output closely matches the input.
Autoencoders are defined by the learning of two functions:
The goal is to minimize the following optimization problem:
argminA,B E[Δ(x,B(A(x)))]
Here, the function Δ (Delta) measures the difference between the output and the original input, and is typically the ℓ2-norm (mean squared error).
Different variations of autoencoders have been developed to serve specific purposes. These variations maintain the basic architecture but employ different regularization strategies and loss functions.
In sparse autoencoders, a large portion of the hidden layer activations are encouraged to be close to zero. This ensures that only the most important features of the data are extracted. Sparsity is typically achieved using methods such as L1 regularization or Kullback-Leibler divergence.
Denoising autoencoders introduce deliberate noise into the input data and train the network to reconstruct the original clean data. This approach helps the model learn more robust and generalizable features.
Variational autoencoders differ from classical autoencoders in that they are generative models. They generate data by sampling from a probability distribution over the latent representation. In this architecture, the encoder learns the parameters of a probability distribution (mean and variance), while the decoder generates data from samples drawn from these parameters.
Autoencoders have a wide range of applications. Different configurations can be used to address various types of problems.
Autoencoders can be used to reduce data to a lower-dimensional representation. Although similar in purpose to traditional dimensionality reduction methods such as Principal Component Analysis (PCA), autoencoders are capable of learning nonlinear relationships.
Denoising autoencoders are particularly used in image processing to remove noise from data. The network learns to predict the original clean data from corrupted input.
Once trained on normal data samples, autoencoders can be used to detect anomalies. Anomalous samples will exhibit high reconstruction error because the network cannot accurately reconstruct them.
In particular, variational autoencoders (VAEs) can be used to generate new data samples. New data can be created by randomly sampling from the distribution learned during training.
Autoencoders are also used for feature extraction from data. In particular, for other machine learning tasks such as classification or clustering, the encoder component of an autoencoder can serve as a feature extractor.
Basic Structure of Autoencoders
Encoder and Decoder Structure
Mathematical Model
Types of Autoencoders
Undercomplete and Overcomplete Autoencoders
Sparse Autoencoders
Denoising Autoencoders
Variational Autoencoders (VAE)
Applications of Autoencoders
Dimensionality Reduction
Denoising
Anomaly Detection
Generative Models
Feature Extraction