Experiment Tracker Integration
Internal API Warning
If you are utilizing the standard d9d training infrastructure, you do not need to call these functions manually. The framework automatically handles tracking based on configuration. This package is primarily intended for users extending d9d.
About
The d9d.tracker package provides a unified, configuration-driven interface for logging metrics, hyperparameters, and distributions during training.
It abstracts the specific backend (such as Aim or simple console logging) behind a common API. This coupled with a pydantic configuration system allows users to switch logging backends via configuration files without changing a single line of training loop code.
Crucially, the tracker is State Aware. It implements the PyTorch Stateful protocol, ensuring that if a training job is interrupted and resumed, the tracker automatically re-attaches to the existing experiment run rather than creating a fragmented new one.
Architecture Separation of Concerns
The module splits tracking logic into two distinct phases:
- The Tracker (Factory/Manager): Represented by
BaseTracker. This object persists throughout the lifecycle of the application. It holds configuration (where to save logs) and state (the ID of the current run). It is responsible for creating "Runs". - The Run (Session): Represented by
BaseTrackerRun. This is a context-managed object active only during the actual training loop. It handles theset_step,scalar, andbinsoperations.
There is also factory method called tracker_from_config that can create a BaseTracker object based on Pydantic configuration.
Adding a New Tracker
To support a new logging backend (e.g., Weights & Biases, MLFlow), you need to implement three components and register them in the factory.
The Configuration
Create a Pydantic model for your tracker's settings. Functionally, it must contain a provider literal field which acts as the discriminator for the polymorphic deserialization.
The Run Handler
Implement BaseTrackerRun. This class maps d9d calls (scalar, bins) to the specific calls of your backend SDK.
The Tracker Factory
Implement BaseTracker. This handles initialization and state persistence (resuming).
Registration
To make tracker_from_config recognize your new tracker, you must modify d9d/tracker/factory.py.
Add your config to AnyTrackerConfig type alias:
Register the mapping in _MAP (wrapping imports in try/except is recommended if the SDK is an optional dependency):
d9d.tracker
Package providing a unified interface for experiment tracking and logging.
BaseTracker
Bases: ABC, Stateful, Generic[TConfig]
Abstract base class for a tracker backend factory.
This class manages the lifecycle of runs and integration with the distributed checkpointing system to ensure experiment continuity (e.g., resuming the same run hash after a restart).
from_config(config)
abstractmethod
classmethod
Factory method to create a tracker instance from a configuration object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
TConfig
|
The backend-specific configuration object. |
required |
Returns:
| Type | Description |
|---|---|
Self
|
An initialized instance of the tracker. |
open(properties)
abstractmethod
Context manager that initiates and manages an experiment run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
properties
|
RunConfig
|
Configuration metadata for the run. |
required |
Yields:
| Type | Description |
|---|---|
BaseTrackerRun
|
An active BaseTrackerRun instance for logging metrics. |
BaseTrackerRun
Bases: ABC
Abstract base class representing an active tracking session (run).
This object is responsible for the actual logging of metrics, parameters, during train or inference run.
bins(name, values, context=None)
abstractmethod
Logs a distribution/histogram of values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the metric. |
required |
values
|
Tensor
|
A tensor containing the population of values to bin. |
required |
context
|
dict[str, str] | None
|
Optional ephemeral context specific to this metric event. Merged with global context if present. |
None
|
scalar(name, value, context=None)
abstractmethod
set_context(context)
abstractmethod
set_step(step)
abstractmethod
Updates the global step counter for subsequent logs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
step
|
int
|
The current step index (e.g., iteration number). |
required |
RunConfig
Bases: BaseModel
Configuration for initializing a specific logged run.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The display name of the experiment. |
description |
str | None
|
An optional description of the experiment. |
hparams |
dict[str, Any]
|
A dictionary of hyperparameters to log at the start of the run. |
tracker_from_config(config)
Instantiates a specific tracker implementation based on the configuration.
Based on the 'provider' field in the config, this function selects the appropriate backend (e.g., Aim, Null). It handles checking for missing dependencies for optional backends.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
AnyTrackerConfig
|
A specific tracker configuration object. |
required |
Returns:
| Type | Description |
|---|---|
BaseTracker
|
An initialized BaseTracker instance. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If the dependencies for the requested provider are not installed. |