About
Warning:
If you are utilizing the standard d9d training infrastructure, you do not need to call these functions manually. The framework automatically handles tracking based on configuration. This package is primarily intended for users extending d9d.
The d9d.tracker package provides a unified, configuration-driven interface for logging metrics, hyperparameters, and distributions during training.
It abstracts the specific backend (such as Aim or simple console logging) behind a common API. This coupled with a pydantic configuration system allows users to switch logging backends via configuration files without changing a single line of training loop code.
Crucially, the tracker is State Aware. It implements the PyTorch Stateful protocol, ensuring that if a training job is interrupted and resumed, the tracker automatically re-attaches to the existing experiment run rather than creating a fragmented new one.
Architecture Separation of Concerns
The module splits tracking logic into two distinct phases:
- The Tracker (Factory/Manager): Represented by
BaseTracker. This object persists throughout the lifecycle of the application. It holds configuration (where to save logs) and state (the ID of the current run). It is responsible for creating "Runs". - The Run (Session): Represented by
BaseTrackerRun. This is a context-managed object active only during the actual training loop. It handles theset_step,scalar, andbinsoperations.
There is also factory method called tracker_from_config that can create a BaseTracker object based on Pydantic configuration.
Adding a New Tracker
To support a new logging backend (e.g., Weights & Biases, MLFlow), you need to implement three components and register them in the factory.
The Configuration
Create a Pydantic model for your tracker's settings. Functionally, it must contain a provider literal field which acts as the discriminator for the polymorphic deserialization.
from typing import Literal
from pydantic import BaseModel
class WandbConfig(BaseModel):
provider: Literal['wandb'] = 'wandb'
project: str
entity: str | None = None
The Run Handler
Implement BaseTrackerRun. This class maps d9d calls (scalar, bins) to the specific calls of your backend SDK.
from d9d.tracker import BaseTrackerRun
class WandbRun(BaseTrackerRun):
def __init__(self, run_obj):
self._run = run_obj
self._step = 0
def set_step(self, step: int):
self._step = step
# ... implement scalar(), bins(), etc. to call self._run.log()
The Tracker Factory
Implement BaseTracker. This handles initialization and state persistence (resuming).
from contextlib import contextmanager
from d9d.tracker import BaseTracker, RunConfig
class WandbTracker(BaseTracker[WandbConfig]):
def __init__(self, config: WandbConfig):
self.config = config
self.run_id = None # State to persist
def state_dict(self):
# This is saved to the checkpoint
return {"run_id": self.run_id}
def load_state_dict(self, state_dict):
# This is restored from the checkpoint
self.run_id = state_dict.get("run_id")
@contextmanager
def open(self, props: RunConfig):
# Logic to init e.g. wandb.init(id=self.run_id, resume="allow", ...)
# self.run_id = ...
# yield WandbRun(...)
# cleanup if necessary
Registration
To make tracker_from_config recognize your new tracker, you must modify d9d/tracker/factory.py.
Add your config to AnyTrackerConfig type alias:
AnyTrackerConfig = Annotated[
AimConfig | NullTrackerConfig | WandbConfig, # <--- Add here
Field(discriminator='provider')
]
Register the mapping in _MAP (wrapping imports in try/except is recommended if the SDK is an optional dependency):
try:
from .provider.wandb.tracker import WandbTracker
_MAP[WandbConfig] = WandbTracker
except ImportError as e:
_MAP[WandbConfig] = _TrackerImportFailed('wandb', e)
d9d.tracker
Package providing a unified interface for experiment tracking and logging.
BaseTracker
Bases: ABC, Stateful, Generic[TConfig]
Abstract base class for a tracker backend factory.
This class manages the lifecycle of runs and integration with the distributed checkpointing system to ensure experiment continuity (e.g., resuming the same run hash after a restart).
Source code in d9d/tracker/base.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |
from_config(config)
abstractmethod
classmethod
Factory method to create a tracker instance from a configuration object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
TConfig
|
The backend-specific configuration object. |
required |
Returns:
| Type | Description |
|---|---|
Self
|
An initialized instance of the tracker. |
Source code in d9d/tracker/base.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |
open(properties)
abstractmethod
Context manager that initiates and manages an experiment run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
properties
|
RunConfig
|
Configuration metadata for the run. |
required |
Yields:
| Type | Description |
|---|---|
BaseTrackerRun
|
An active BaseTrackerRun instance for logging metrics. |
Source code in d9d/tracker/base.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
BaseTrackerRun
Bases: ABC
Abstract base class representing an active tracking session (run).
This object is responsible for the actual logging of metrics, parameters, during train or inference run.
Source code in d9d/tracker/base.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | |
bins(name, values, context=None)
abstractmethod
Logs a distribution/histogram of values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the metric. |
required |
values
|
Tensor
|
A tensor containing the population of values to bin. |
required |
context
|
dict[str, str] | None
|
Optional ephemeral context specific to this metric event. Merged with global context if present. |
None
|
Source code in d9d/tracker/base.py
55 56 57 58 59 60 61 62 63 64 65 66 | |
scalar(name, value, context=None)
abstractmethod
Logs a scalar value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the metric. |
required |
value
|
float
|
The scalar value to log. |
required |
context
|
dict[str, str] | None
|
Optional ephemeral context specific to this metric event. Merged with global context if present. |
None
|
Source code in d9d/tracker/base.py
42 43 44 45 46 47 48 49 50 51 52 53 | |
set_context(context)
abstractmethod
Sets a persistent context dictionary for subsequent logs.
These context values (tags) will be attached to every metric logged until changed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
dict[str, str]
|
A dictionary of tag names and values. |
required |
Source code in d9d/tracker/base.py
29 30 31 32 33 34 35 36 37 38 39 40 | |
set_step(step)
abstractmethod
Updates the global step counter for subsequent logs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
step
|
int
|
The current step index (e.g., iteration number). |
required |
Source code in d9d/tracker/base.py
19 20 21 22 23 24 25 26 27 | |
RunConfig
Bases: BaseModel
Configuration for initializing a specific logged run.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The display name of the experiment. |
description |
str | None
|
An optional description of the experiment. |
hparams |
dict[str, Any]
|
A dictionary of hyperparameters to log at the start of the run. |
Source code in d9d/tracker/base.py
69 70 71 72 73 74 75 76 77 78 79 80 81 | |
tracker_from_config(config)
Instantiates a specific tracker implementation based on the configuration.
Based on the 'provider' field in the config, this function selects the appropriate backend (e.g., Aim, Null). It handles checking for missing dependencies for optional backends.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
AnyTrackerConfig
|
A specific tracker configuration object. |
required |
Returns:
| Type | Description |
|---|---|
BaseTracker
|
An initialized BaseTracker instance. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If the dependencies for the requested provider are not installed. |
Source code in d9d/tracker/factory.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | |