d9d

  • Home
Core
  • Autograd Extensions
  • Distributed Context
  • Distributed Operations
  • PyTree Sharding
  • Typing Extensions
Dataset
  • Datasets
Internals
  • Determinism
  • Gradient Synchronization
  • Pipeline State Management
  • Pipelining Internals
  • Distributed Profiling
  • Experiment Tracking
Lr scheduler
  • Piecewise Scheduler
  • Visualization
Metric
  • Metrics
Model states
  • Model State I/O
  • Model State Mapper
Models
  • Model Design
  • Horizontal Parallelism
  • Pipeline Parallelism
  • Qwen3 MoE
Modules
  • Attention Layers
  • Embeddings
  • Feed Forward Networks (FFN)
  • Model Heads
  • Hidden States Aggregation
  • Mixture of Experts (MoE)
  • Positional Embeddings
Optimizer
  • Stochastic Optimizers
Peft
  • PEFT Overview
  • Full Fine-Tuning
  • LoRA
  • Method Stacking

Home

Next
Autograd Extensions
Menu
Home
Core
Autograd Extensions Distributed Context Distributed Operations PyTree Sharding Typing Extensions
Dataset
Datasets
Internals
Determinism Gradient Synchronization Pipeline State Management Pipelining Internals Distributed Profiling Experiment Tracking
Lr scheduler
Piecewise Scheduler Visualization
Metric
Metrics
Model states
Model State I/O Model State Mapper
Models
Model Design Horizontal Parallelism Pipeline Parallelism Qwen3 MoE
Modules
Attention Layers Embeddings Feed Forward Networks (FFN) Model Heads Hidden States Aggregation Mixture of Experts (MoE) Positional Embeddings
Optimizer
Stochastic Optimizers
Peft
PEFT Overview Full Fine-Tuning LoRA Method Stacking

On This Page

shadcn theme provided by @asiffer