badge icon

This article was automatically translated from the original Turkish version.

Article
AdamW.png
Year
2017
Advantages
Weight Penalty IndependenceBetter Overfitting Control

AdamW (Adam with Weight Decay) is a variant of the Adam optimization algorithm and provides a significant improvement related to model regularization. This variant aims to enhance Adam’s overall performance and generalization capability by incorporating an L2 penalty term (weight decay). In the traditional Adam algorithm, weight decay is computed together with the gradient updates; however, AdamW applies this penalty term independently of the update step, enabling more effective regularization.

Key Concepts

AdamW retains the core structure of the Adam algorithm but introduces a modification in how weight regularization is applied. The L2 penalty term helps prevent overfitting by constraining the magnitude of the model’s weights. While the Adam algorithm incorrectly incorporates this regularization within the gradient updates, AdamW applies it as a separate step.

Mathematical Formulation of AdamW

AdamW has a structure similar to the Adam algorithm but separates the weight decay term during the update process. The update steps of the AdamW algorithm are defined as follows:

Calculation of Moments:

    Author Information

    Avatar
    AuthorKaan GümeleDecember 9, 2025 at 6:24 AM

    Tags

    Discussions

    No Discussion Added Yet

    Start discussion for "AdamW" article

    View Discussions

    Contents

    • Key Concepts

    • Mathematical Formulation of AdamW

      • Calculation of Moments:

    Ask to Küre