does FSDP support AMSP (a new DP shard strategy) #128706

guoyejun · 2024-06-14T15:28:20Z

🚀 The feature, motivation and pitch

there's a new DP shard strategy which is more flexible and general, see more detail at https://arxiv.org/abs/2311.00257 AMSP: Reducing Communication Overhead of ZeRO for Efficient LLM Training

Does FSDP support similar feature? If not, any plan to support it? thanks.

Alternatives

No response

Additional context

No response

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k

awgu · 2024-06-14T18:28:21Z

I do not think FSDP supports this currently. In my high level understanding, the flexibility introduced in AMSP is mainly useful when doing microbatching / gradient accumulation?

guoyejun · 2024-06-20T12:19:53Z

My understanding is that the flexibility comes from the new solution that the sharding strategy for parameter, gradient and optimizer states can be different. It by nature provides many sharding strategies, including DDP, ZeRO1, ZeRO2, ZeRO3, HSDP and MiCS and many more. With a given cluster and a given model, we may find a better sharding strategy, such as the table iv in the paper, also copy below.

another thing is that the sharding strategy is represented in two dims, one for node#, the other for gpu# in one node, it is more clear.

It is general because all these sharding strategies can be obtained by just changing the values of the configuration. We can even loose some constrains in the paper with the key idea from the paper, if possible.

malfet added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jun 14, 2024

weifengpy added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

does FSDP support AMSP (a new DP shard strategy) #128706

does FSDP support AMSP (a new DP shard strategy) #128706

guoyejun commented Jun 14, 2024 •

edited by pytorch-bot bot

Loading

awgu commented Jun 14, 2024

guoyejun commented Jun 20, 2024

does FSDP support AMSP (a new DP shard strategy) #128706

does FSDP support AMSP (a new DP shard strategy) #128706

Comments

guoyejun commented Jun 14, 2024 • edited by pytorch-bot bot Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

awgu commented Jun 14, 2024

guoyejun commented Jun 20, 2024

guoyejun commented Jun 14, 2024 •

edited by pytorch-bot bot

Loading