About
Warning:
If you are utilizing the standard d9d training infrastructure, you do not need to call these functions manually. The framework automatically handles profiling based on configuration. This package is primarily intended for users extending d9d.
The d9d.internals.profiling package provides a distributed-aware wrapper around the standard PyTorch Profiler.
In large-scale distributed training, profiling often becomes difficult due to:
- File Naming: Thousands of ranks writing to the same filename causes race conditions.
- Storage Space: Raw Chrome tracing JSON files can grow to gigabytes very quickly.
- Synchronization: Ensuring all ranks profile the same specific step without manual intervention.
The Profiler class solves these issues by automatically handling file naming based on the DeviceMesh coordinates, compressing traces into .tar.gz archives on the fly, and managing the profiling schedule (wait/warmup/active).
d9d.internals.profiling
Exposes the internal distributed profiler.
Profiler
Manages distributed performance profiling using PyTorch Profiler.
This class wraps torch.profiler to provide automatic trace exporting,
compression, and file naming consistent with the distributed DeviceMesh
topology. It configures the schedule to repeat periodically based on
the provided step counts.
Source code in d9d/internals/profiling/profile.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | |
__init__(save_dir, period_steps, warmup_steps, active_steps, dist_context)
Constructs a Profiler object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save_dir
|
Path
|
Directory where trace files will be saved. |
required |
period_steps
|
int
|
Total length of a profiling cycle (wait + warmup + active). |
required |
warmup_steps
|
int
|
Number of steps to ignore before recording to allow for warming-up. |
required |
active_steps
|
int
|
Number of steps to actively record traces. |
required |
dist_context
|
DistributedContext
|
The distributed context object. |
required |
Source code in d9d/internals/profiling/profile.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | |
open(start_step)
Opens a context manager for profiling execution.
This sets up the torch.profiler.profile with a schedule derived from
the initialization parameters. It captures both CPU and CUDA activities,
records shapes, and tracks stack traces.
When the schedule triggers on_trace_ready, the trace is automatically
exported to the save_dir, compressed into a .tar.gz file, and the
raw JSON is removed to save space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_step
|
int
|
The current global step number to initialize the profiler state. |
required |
Yields:
| Type | Description |
|---|---|
|
The configured torch profiler instance. |
Source code in d9d/internals/profiling/profile.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | |