Optimizer
Auto Optimizer
For standard PyTorch usage, d9d includes the d9d.loop.auto package. These providers ingest a Pydantic configuration object and manage the creation of standard optimizers.
Supports AdamW, Adam, SGD, and StochasticAdamW.
d9d.loop.auto.auto_optimizer
AdamOptimizerConfig
Bases: BaseAutoOptimizerConfig
Configuration for the PyTorch Adam optimizer.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
Literal['adam']
|
Discriminator tag. |
lr |
float
|
The learning rate. |
betas |
tuple[float, float]
|
Coefficients for computing running averages of gradient and its square. |
eps |
float
|
Term added to the denominator to improve numerical stability. |
weight_decay |
float
|
Weight decay coefficient. |
decoupled_weight_decay |
bool
|
Whether to apply decoupled weight decay. |
amsgrad |
bool
|
Whether to use the AMSGrad variant. |
maximize |
bool
|
Whether to maximize the params based on the objective. |
build(params)
Builds fused Adam with the configured parameters.
AdamWOptimizerConfig
Bases: BaseAutoOptimizerConfig
Configuration for the PyTorch AdamW optimizer.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
Literal['adamw']
|
Discriminator tag. |
lr |
float
|
The learning rate. |
betas |
tuple[float, float]
|
Coefficients for computing running averages of gradient and its square. |
eps |
float
|
Term added to the denominator to improve numerical stability. |
weight_decay |
float
|
Weight decay coefficient. |
amsgrad |
bool
|
Whether to use the AMSGrad variant. |
maximize |
bool
|
Whether to maximize the params based on the objective (as opposed to minimizing). |
build(params)
Builds fused AdamW with the configured parameters.
AutoOptimizerProvider
Bases: OptimizerProvider
OptimizerProvider that builds a PyTorch optimizer based on a configuration object.
__init__(config)
Constructs the provider with the given configuration.
BaseAutoOptimizerConfig
Abstract base class for optimizer configurations.
SGDOptimizerConfig
Bases: BaseAutoOptimizerConfig
Configuration for the PyTorch SGD optimizer.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
Literal['sgd']
|
Discriminator tag. |
lr |
float
|
The learning rate. |
momentum |
float
|
Momentum factor. |
dampening |
float
|
Dampening for momentum. |
weight_decay |
float
|
Weight decay (L2 penalty). |
nesterov |
bool
|
Enables Nesterov momentum. |
maximize |
bool
|
Whether to maximize the params based on the objective. |
build(params)
Builds fused SGD with the configured parameters.
StochasticAdamWOptimizerConfig
Bases: BaseAutoOptimizerConfig
Configuration for the Stochastic AdamW optimizer.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
Literal['stochastic_adamw']
|
Discriminator tag. |
lr |
float
|
Learning rate. |
betas |
tuple[float, float]
|
Coefficients used for computing running averages of gradient and its square. |
eps |
float
|
Term added to the denominator to improve numerical stability. |
weight_decay |
float
|
Weight decay coefficient. |
state_dtype |
str
|
Data Type to use for the optimizer states. |
build(params)
Builds StochasticAdamW with the configured parameters.
Interface
If you need a custom optimizer, you implement the OptimizerProvider protocol.
d9d.loop.control.optimizer_provider
InitializeOptimizerStageContext
dataclass
Context data required to initialize an optimizer.
Attributes:
| Name | Type | Description |
|---|---|---|
dist_context |
DistributedContext
|
The distributed context. |
model |
Module
|
The model instance for which parameters will be optimized. |
OptimizerProvider
Bases: Protocol
Protocol for defining how optimizers are created for model pipeline stages.
__call__(context)
abstractmethod
Initializes the optimizer for a specific training stage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
InitializeOptimizerStageContext
|
Context for this operation. |
required |
Returns:
| Type | Description |
|---|---|
Optimizer
|
The instantiated PyTorch optimizer. |