About

Warning:

If you are utilizing the standard d9d training infrastructure, you do not need to call these functions manually. The framework automatically handles tracking based on configuration. This package is primarily intended for users extending d9d.

The d9d.tracker package provides a unified, configuration-driven interface for logging metrics, hyperparameters, and distributions during training.

It abstracts the specific backend (such as Aim or simple console logging) behind a common API. This coupled with a pydantic configuration system allows users to switch logging backends via configuration files without changing a single line of training loop code.

Crucially, the tracker is State Aware. It implements the PyTorch Stateful protocol, ensuring that if a training job is interrupted and resumed, the tracker automatically re-attaches to the existing experiment run rather than creating a fragmented new one.

Architecture Separation of Concerns

The module splits tracking logic into two distinct phases:

  1. The Tracker (Factory/Manager): Represented by BaseTracker. This object persists throughout the lifecycle of the application. It holds configuration (where to save logs) and state (the ID of the current run). It is responsible for creating "Runs".
  2. The Run (Session): Represented by BaseTrackerRun. This is a context-managed object active only during the actual training loop. It handles the set_step, scalar, and bins operations.

There is also factory method called tracker_from_config that can create a BaseTracker object based on Pydantic configuration.

Adding a New Tracker

To support a new logging backend (e.g., Weights & Biases, MLFlow), you need to implement three components and register them in the factory.

The Configuration

Create a Pydantic model for your tracker's settings. Functionally, it must contain a provider literal field which acts as the discriminator for the polymorphic deserialization.

from typing import Literal
from pydantic import BaseModel

class WandbConfig(BaseModel):
    provider: Literal['wandb'] = 'wandb'
    project: str
    entity: str | None = None

The Run Handler

Implement BaseTrackerRun. This class maps d9d calls (scalar, bins) to the specific calls of your backend SDK.

from d9d.tracker import BaseTrackerRun

class WandbRun(BaseTrackerRun):
    def __init__(self, run_obj):
        self._run = run_obj
        self._step = 0

    def set_step(self, step: int):
        self._step = step

    # ... implement scalar(), bins(), etc. to call self._run.log()

The Tracker Factory

Implement BaseTracker. This handles initialization and state persistence (resuming).

from contextlib import contextmanager
from d9d.tracker import BaseTracker, RunConfig

class WandbTracker(BaseTracker[WandbConfig]):
    def __init__(self, config: WandbConfig):
        self.config = config
        self.run_id = None # State to persist

    def state_dict(self):
        # This is saved to the checkpoint
        return {"run_id": self.run_id}

    def load_state_dict(self, state_dict):
        # This is restored from the checkpoint
        self.run_id = state_dict.get("run_id")

    @contextmanager
    def open(self, props: RunConfig):
        # Logic to init e.g. wandb.init(id=self.run_id, resume="allow", ...)
        # self.run_id = ...
        # yield WandbRun(...)
        # cleanup if necessary

Registration

To make tracker_from_config recognize your new tracker, you must modify d9d/tracker/factory.py.

Add your config to AnyTrackerConfig type alias:

AnyTrackerConfig = Annotated[
    AimConfig | NullTrackerConfig | WandbConfig, # <--- Add here
    Field(discriminator='provider')
]

Register the mapping in _MAP (wrapping imports in try/except is recommended if the SDK is an optional dependency):

try:
    from .provider.wandb.tracker import WandbTracker
    _MAP[WandbConfig] = WandbTracker
except ImportError as e:
    _MAP[WandbConfig] = _TrackerImportFailed('wandb', e)

d9d.tracker

Package providing a unified interface for experiment tracking and logging.

BaseTracker

Bases: ABC, Stateful, Generic[TConfig]

Abstract base class for a tracker backend factory.

This class manages the lifecycle of runs and integration with the distributed checkpointing system to ensure experiment continuity (e.g., resuming the same run hash after a restart).

Source code in d9d/tracker/base.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
class BaseTracker(abc.ABC, Stateful, Generic[TConfig]):
    """
    Abstract base class for a tracker backend factory.

    This class manages the lifecycle of runs and integration with the
    distributed checkpointing system to ensure experiment continuity
    (e.g., resuming the same run hash after a restart).
    """

    @contextmanager
    @abc.abstractmethod
    def open(self, properties: RunConfig) -> Generator[BaseTrackerRun, None, None]:
        """
        Context manager that initiates and manages an experiment run.

        Args:
            properties: Configuration metadata for the run.

        Yields:
            An active BaseTrackerRun instance for logging metrics.
        """

        ...

    @classmethod
    @abc.abstractmethod
    def from_config(cls, config: TConfig) -> Self:
        """
        Factory method to create a tracker instance from a configuration object.

        Args:
            config: The backend-specific configuration object.

        Returns:
            An initialized instance of the tracker.
        """

        ...

from_config(config) abstractmethod classmethod

Factory method to create a tracker instance from a configuration object.

Parameters:

Name Type Description Default
config TConfig

The backend-specific configuration object.

required

Returns:

Type Description
Self

An initialized instance of the tracker.

Source code in d9d/tracker/base.py
111
112
113
114
115
116
117
118
119
120
121
122
123
124
@classmethod
@abc.abstractmethod
def from_config(cls, config: TConfig) -> Self:
    """
    Factory method to create a tracker instance from a configuration object.

    Args:
        config: The backend-specific configuration object.

    Returns:
        An initialized instance of the tracker.
    """

    ...

open(properties) abstractmethod

Context manager that initiates and manages an experiment run.

Parameters:

Name Type Description Default
properties RunConfig

Configuration metadata for the run.

required

Yields:

Type Description
BaseTrackerRun

An active BaseTrackerRun instance for logging metrics.

Source code in d9d/tracker/base.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
@contextmanager
@abc.abstractmethod
def open(self, properties: RunConfig) -> Generator[BaseTrackerRun, None, None]:
    """
    Context manager that initiates and manages an experiment run.

    Args:
        properties: Configuration metadata for the run.

    Yields:
        An active BaseTrackerRun instance for logging metrics.
    """

    ...

BaseTrackerRun

Bases: ABC

Abstract base class representing an active tracking session (run).

This object is responsible for the actual logging of metrics, parameters, during train or inference run.

Source code in d9d/tracker/base.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
class BaseTrackerRun(abc.ABC):
    """
    Abstract base class representing an active tracking session (run).

    This object is responsible for the actual logging of metrics, parameters,
    during train or inference run.
    """

    @abc.abstractmethod
    def set_step(self, step: int):
        """
        Updates the global step counter for subsequent logs.

        Args:
            step: The current step index (e.g., iteration number).
        """
        ...

    @abc.abstractmethod
    def set_context(self, context: dict[str, str]):
        """
        Sets a persistent context dictionary for subsequent logs.

        These context values (tags) will be attached to every metric logged
        until changed.

        Args:
            context: A dictionary of tag names and values.
        """
        ...

    @abc.abstractmethod
    def scalar(self, name: str, value: float, context: dict[str, str] | None = None):
        """
        Logs a scalar value.

        Args:
            name: The name of the metric.
            value: The scalar value to log.
            context: Optional ephemeral context specific to this metric event.
                Merged with global context if present.
        """
        ...

    @abc.abstractmethod
    def bins(self, name: str, values: torch.Tensor, context: dict[str, str] | None = None):
        """
        Logs a distribution/histogram of values.

        Args:
            name: The name of the metric.
            values: A tensor containing the population of values to bin.
            context: Optional ephemeral context specific to this metric event.
                Merged with global context if present.
        """
        ...

bins(name, values, context=None) abstractmethod

Logs a distribution/histogram of values.

Parameters:

Name Type Description Default
name str

The name of the metric.

required
values Tensor

A tensor containing the population of values to bin.

required
context dict[str, str] | None

Optional ephemeral context specific to this metric event. Merged with global context if present.

None
Source code in d9d/tracker/base.py
55
56
57
58
59
60
61
62
63
64
65
66
@abc.abstractmethod
def bins(self, name: str, values: torch.Tensor, context: dict[str, str] | None = None):
    """
    Logs a distribution/histogram of values.

    Args:
        name: The name of the metric.
        values: A tensor containing the population of values to bin.
        context: Optional ephemeral context specific to this metric event.
            Merged with global context if present.
    """
    ...

scalar(name, value, context=None) abstractmethod

Logs a scalar value.

Parameters:

Name Type Description Default
name str

The name of the metric.

required
value float

The scalar value to log.

required
context dict[str, str] | None

Optional ephemeral context specific to this metric event. Merged with global context if present.

None
Source code in d9d/tracker/base.py
42
43
44
45
46
47
48
49
50
51
52
53
@abc.abstractmethod
def scalar(self, name: str, value: float, context: dict[str, str] | None = None):
    """
    Logs a scalar value.

    Args:
        name: The name of the metric.
        value: The scalar value to log.
        context: Optional ephemeral context specific to this metric event.
            Merged with global context if present.
    """
    ...

set_context(context) abstractmethod

Sets a persistent context dictionary for subsequent logs.

These context values (tags) will be attached to every metric logged until changed.

Parameters:

Name Type Description Default
context dict[str, str]

A dictionary of tag names and values.

required
Source code in d9d/tracker/base.py
29
30
31
32
33
34
35
36
37
38
39
40
@abc.abstractmethod
def set_context(self, context: dict[str, str]):
    """
    Sets a persistent context dictionary for subsequent logs.

    These context values (tags) will be attached to every metric logged
    until changed.

    Args:
        context: A dictionary of tag names and values.
    """
    ...

set_step(step) abstractmethod

Updates the global step counter for subsequent logs.

Parameters:

Name Type Description Default
step int

The current step index (e.g., iteration number).

required
Source code in d9d/tracker/base.py
19
20
21
22
23
24
25
26
27
@abc.abstractmethod
def set_step(self, step: int):
    """
    Updates the global step counter for subsequent logs.

    Args:
        step: The current step index (e.g., iteration number).
    """
    ...

RunConfig

Bases: BaseModel

Configuration for initializing a specific logged run.

Attributes:

Name Type Description
name str

The display name of the experiment.

description str | None

An optional description of the experiment.

hparams dict[str, Any]

A dictionary of hyperparameters to log at the start of the run.

Source code in d9d/tracker/base.py
69
70
71
72
73
74
75
76
77
78
79
80
81
class RunConfig(BaseModel):
    """
    Configuration for initializing a specific logged run.

    Attributes:
        name: The display name of the experiment.
        description: An optional description of the experiment.
        hparams: A dictionary of hyperparameters to log at the start of the run.
    """

    name: str
    description: str | None
    hparams: dict[str, Any] = Field(default_factory=dict)

tracker_from_config(config)

Instantiates a specific tracker implementation based on the configuration.

Based on the 'provider' field in the config, this function selects the appropriate backend (e.g., Aim, Null). It handles checking for missing dependencies for optional backends.

Parameters:

Name Type Description Default
config AnyTrackerConfig

A specific tracker configuration object.

required

Returns:

Type Description
BaseTracker

An initialized BaseTracker instance.

Raises:

Type Description
ImportError

If the dependencies for the requested provider are not installed.

Source code in d9d/tracker/factory.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def tracker_from_config(config: AnyTrackerConfig) -> BaseTracker:
    """
    Instantiates a specific tracker implementation based on the configuration.

    Based on the 'provider' field in the config, this function selects the
    appropriate backend (e.g., Aim, Null). It handles checking for missing
    dependencies for optional backends.

    Args:
        config: A specific tracker configuration object.

    Returns:
        An initialized BaseTracker instance.

    Raises:
        ImportError: If the dependencies for the requested provider are not installed.
    """

    tracker_type = _MAP[type(config)]

    if isinstance(tracker_type, _TrackerImportFailed):
        raise ImportError(
            f"The tracker configuration {config.provider} could not be loaded - "
            f"ensure these dependencies are installed: {tracker_type.dependency}"
        ) from tracker_type.exception

    return tracker_type.from_config(config)