Table of Contents
🌐 Distributed Core
The foundational primitives managing the cluster.
- Distributed Context: The Source of Truth for topology. Understanding
DeviceMeshdomains (dense,expert,batch). - Distributed Operations: Utilities for gathering var-length tensors and objects.
- PyTree Sharding: Utilities for splitting complex nested structures across ranks.
- Typing Extensions: Python type annotations for common objects and structures.
🚀 Execution Engine
How to configure and run jobs.
- Training Loop: The lifecycle of the
Trainer, dependency injection, and execution flow. - Inference Loop: The lifecycle of distributed
Inferenceand forward-only execution. - Configuration: Pydantic schemas for configuring jobs, batching, and logging.
- Interfaces (Providers & Tasks): How to inject your custom Model, Dataset, and Step logic (Train & Infer).
💾 Data & State
Managing data loading and model checkpoints.
- Model State Mapper: The graph-based transformation engine for checkpoints (transform architectures on-the-fly).
- Model State I/O: Streaming reader/writers for checkpoints.
- Datasets: Distributed-aware dataset wrappers and smart bucketing.
🧠 Modeling & Architecture
Building blocks for modern LLMs.
- Model Catalogue: Models available directly in d9d.
- Model Design: Principles for creating compatible models.
- Modules: Building blocks for implementing compatible models.
⚡ Parallelism
Strategies for distributing computations.
- Horizontal Parallelism: Data Parallelism, Fully-Sharded Data Parallelism, Expert Parallelism, Tensor Parallelism.
- Pipeline Parallelism: Vertical scaling, schedules (1F1B, ZeroBubble), and cross-stage communication.
🔧 Fine-Tuning (PEFT)
Parameter-Efficient Fine-Tuning framework.
- Overview: Injection lifecycle and state mapping.
- Methods: LoRA, Full Tune, and Method Stacking.
📈 Optimization & Metrics
- Metrics: Distributed-aware statistic accumulation.
- Metric Catalogue: Ready-to-use metric implementations.
- Custom Metrics: Implementing custom metrics.
- Experiment Tracking: Integration with logging backends (WandB, Aim).
- Piecewise Scheduler: Composable LR schedules and Visualization.
- Stochastic Optimizers: Low-precision training using stochastic rounding.
⚙️ Internals
Deep dive into the engine room.
- AutoGrad Extensions: How we do split-backward for Pipeline Parallel.
- Pipelining Internals: How the VM and Schedules work.
- Gradient Sync: Custom backward hooks for overlapping comms.
- Gradient Norm & Clipping: Correct global norm calculation across hybrid meshes.
- Metric Collection: Custom overlapped metric synchronization & computation.
- Pipeline State: Context switching between Global and Microbatch scopes.
- Determinism.
- Profiling.