This article was automatically translated from the original Turkish version.
Year | 2017 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
Advantage(s) | Reduced Need for Learning Rate Good Performance with Large Gradients | ||||||||
Adamax is a generalized version of the Adam algorithm, distinguished by its operation over the infinity norm (∞-norm). Introduced by Kingma and Ba in 2015 alongside Adam, this algorithm aims to provide more stable and effective updates, particularly in high-dimensional parameter spaces. By replacing the L2 norm in Adam with the infinity norm, Adamax controls the influence of large gradients and delivers a more stable learning process.
The Adam algorithm optimizes gradient descent by combining momentum estimates with adaptive learning rates. However, its performance can degrade in situations where the second-moment (L2 norm) estimates become unstable. Adamax addresses this issue by replacing the second-moment estimate with the infinity norm (∞-norm).
In the Adam algorithm, the second-moment estimate is computed as:
The Adamax algorithm enhances the stability of parameter updates by employing the infinity norm.
Kingma, D., and J. Ba. 2014. “Adam: A Method for Stochastic Optimization.” Computer Science. https://doi.org/10.48550/arXiv.1412.6980.
Ruder, Sebastian. 2017. “An Overview of Gradient Descent Optimization Algorithms.” ArXiv.org. June 15, 2017. https://doi.org/10.48550/arXiv.1609.04747.
Year | 2017 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
Advantage(s) | Reduced Need for Learning Rate Good Performance with Large Gradients | ||||||||
Adamax Optimization Algorithm
Key Difference Between Adam and Adamax