Skip to content
d9d
Model Catalogue
Search
GitHub
d9d
GitHub
Home
Table of Contents
Core
Core
Distributed Context
Distributed Operations
PyTree Sharding Utilities
Typing Extensions
Autograd Extensions
Training and Inference
Training and Inference
Training Loop
Inference Loop
Configuration Schemas
Interfaces & Logic
Interfaces & Logic
User Tasks
Model Definition
Data Loading
Event Bus & Hooks
Optimizer
Learning Rate Scheduler
Model States
Model States
Model State IO
Model State Mapper
Models
Models
Model Catalogue
Model Catalogue
Qwen3 MoE
Modules
Modules
Embeddings
Model Heads
Attention Layers
Feed Forward Networks (FFN)
Mixture of Experts (MoE)
Positional Embeddings
Hidden States Aggregation
Model Design
Horizontal Parallelism
Pipeline Parallelism
Datasets
Datasets
Parameter-Efficient Fine-Tuning (PEFT)
Parameter-Efficient Fine-Tuning (PEFT)
Overview
Low-Rank Adaptation (LoRA)
Full Fine-Tuning
Method Stacking
Metrics
Metrics
Overview
Metric Catalogue
Metric Catalogue
Aggregation Metrics
Classification Metrics
Container Metrics
Creating Custom Metrics
Optimizers
Optimizers
Stochastic Optimizers
Learning Rate Schedulers
Learning Rate Schedulers
Piecewise Scheduler
Schedule Visualization
Internal APIs
Internal APIs
Determinism
Gradient Norm & Clipping
Gradient Synchronization
Async Metric Collection
Pipeline State Management
Pipelining Internals
Distributed Profiling
Experiment Tracker Integration
Home
Models
Model Catalogue
Model Catalogue
Qwen3 Mixture of Experts
Qwen3 Dense
Back to top