Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content

Avoid redundant Megatron empty_cache and GC cleanup#730

Merged
FurtherAI merged 3 commits into
mainfrom
austin/megatron/optimize_empty_cache
Jun 17, 2026
Merged

Avoid redundant Megatron empty_cache and GC cleanup#730
FurtherAI merged 3 commits into
mainfrom
austin/megatron/optimize_empty_cache

Conversation

@FurtherAI

Copy link
Copy Markdown
Collaborator

Summary

  • Remove redundant gc.collect() and torch.cuda.empty_cache() calls from Megatron service, job, and step cleanup paths.
  • Keep CUDA cache clearing centralized for colocated weight offload only, after weights are offloaded.
  • Avoid dedicated-mode cleanup syncs between training jobs.

Tests

  • Benchmarked Megatron throughput at max sequence length with CP4/EP4 Qwen3.5; completed without OOM.
  • Throughput improved over previous results.

@FurtherAI FurtherAI marked this pull request as ready for review June 17, 2026 20:57
@FurtherAI FurtherAI merged commit 81d8f9e into main Jun 17, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant