High memory usage on Llama3-70B full finetune during checkpoint save #1092

ebsmothers · 2024-06-14T16:45:07Z

See the discussion on #993. @andyl98 has reported that when running on 8X A100 with 1 TB DRAM they hit CPU OOM during checkpoint save. They also point out that they do not see the OOM without the usage of FullOptimStateDictConfig.

In an ideal world I think we should only need (model params) + (optimizer params) = (70B * 2) + (70B * 2 * 2) = 420 GB in bf16, so seems like the unsharding is being done inefficiently (at least wrt CPU RAM)?

The text was updated successfully, but these errors were encountered:

RdoubleA · 2024-06-15T18:18:30Z

Why not use DCP checkpointing which doesn't require unsharding and prevents CPU OOMs? cc @LucasLLC

ebsmothers self-assigned this Jun 14, 2024

ebsmothers mentioned this issue Jun 14, 2024

Llama3-70b: Full Finetune w/CPU offload + fused optimizer #993

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory usage on Llama3-70B full finetune during checkpoint save #1092

High memory usage on Llama3-70B full finetune during checkpoint save #1092

ebsmothers commented Jun 14, 2024

RdoubleA commented Jun 15, 2024

High memory usage on Llama3-70B full finetune during checkpoint save #1092

High memory usage on Llama3-70B full finetune during checkpoint save #1092

Comments

ebsmothers commented Jun 14, 2024

RdoubleA commented Jun 15, 2024