badge icon

This article was automatically translated from the original Turkish version.

Article
Adamax.png

Year

2017

Advantage(s)

Reduced Need for Learning Rate

Good Performance with Large Gradients

Adamax is a generalized version of the Adam algorithm, distinguished by its operation over the infinity norm (∞-norm). Introduced by Kingma and Ba in 2015 alongside Adam, this algorithm aims to provide more stable and effective updates, particularly in high-dimensional parameter spaces. By replacing the L2 norm in Adam with the infinity norm, Adamax controls the influence of large gradients and delivers a more stable learning process.

Adamax Optimization Algorithm

Key Difference Between Adam and Adamax

The Adam algorithm optimizes gradient descent by combining momentum estimates with adaptive learning rates. However, its performance can degrade in situations where the second-moment (L2 norm) estimates become unstable. Adamax addresses this issue by replacing the second-moment estimate with the infinity norm (∞-norm).


In the Adam algorithm, the second-moment estimate is computed as:


The Adamax algorithm enhances the stability of parameter updates by employing the infinity norm.

Bibliographies


Kingma, D., and J. Ba. 2014. “Adam: A Method for Stochastic Optimization.” Computer Science. https://doi.org/10.48550/arXiv.1412.6980.

Ruder, Sebastian. 2017. “An Overview of Gradient Descent Optimization Algorithms.” ArXiv.org. June 15, 2017. https://doi.org/10.48550/arXiv.1609.04747.

Author Information

Avatar
AuthorKaan GümeleDecember 9, 2025 at 6:23 AM

Tags

Discussions

No Discussion Added Yet

Start discussion for "Adamax" article

View Discussions

Contents

  • Adamax Optimization Algorithm

    • Key Difference Between Adam and Adamax

Ask to Küre