Gradients are None after booster.backward #5792

ArnaudFickinger · 2024-06-11T03:41:43Z

After calling booster.backward(loss=loss, optimizer=optimizer), all gradients of model.module are None. Is there a way to access the gradients?

B-Soul · 2024-06-11T04:31:56Z

I meet the same problem，have you found a solution?

botbw · 2024-06-11T05:08:54Z

hey @ArnaudFickinger @B-Soul , could you please share the settings of your scripts?

B-Soul · 2024-06-11T05:16:20Z

My code is related to my own ongoing research, so it is not convenient to share. But I just changed the distributed framework used to Huggingface Accelerate, and gradients are not None. So, I think there is a bug in colossalai framwork.

botbw · 2024-06-11T05:28:16Z

My code is related to my own ongoing research, so it is not convenient to share. But I just changed the distributed framework used to Huggingface Accelerate, and gradients are not None. So, I think there is a bug in colossalai framwork.

hi @B-Soul , a snippet of optimizer/plugin settings will help. Besides, the gradient accessing API might be different due to optimization, if you are using LowLevelZeroOptimizer or GeminiOptimizer, you could check those tests for gradient accessing: genimi and low-level

ArnaudFickinger · 2024-06-11T16:39:42Z

@botbw thank you the low-level snippet is working! By the way which of gemini or low-level should I use for best performance with 1 to 8 A100 GPUs and 500M to 2B trainable parameters?

botbw · 2024-06-12T01:56:15Z

@botbw thank you the low-level snippet is working! By the way which of gemini or low-level should I use for best performance with 1 to 8 A100 GPUs and 500M to 2B trainable parameters?

@ArnaudFickinger Glad to hear that! And we might work on the API to make it more intuitive.

Regarding the performance, LowLevelZeroOptimizer implements zero-1 and zero-2 and GeminiOptimizer implements zero-3 together with continuous memory optimization (i.e. memory locality, you may check this doc for more information) to reduce communication cost.

Generally speaking, you should choose the plugin by the intended zero-n parallel strategy, the real-world performance might be case-by-case and depend on the trade-off between computation and communication.

Do let us know if you have further doubts :p

ArnaudFickinger · 2024-06-15T07:35:35Z

@botbw when I define 2 param_groups the id() of the parameters of the second group do not match any keys of optimizer._grad_store._grads_of_params[1]

botbw · 2024-06-15T08:37:13Z

@botbw when I define 2 param_groups the id() of the parameters of the second group do not match any keys of optimizer._grad_store._grads_of_params[1]

@ArnaudFickinger I guess it's unexpected since each group is handled separately in the same way (like a for loop), would you mind sharing the version (or commit) you are using and a min repro if possible?

ArnaudFickinger · 2024-06-15T16:13:05Z

@botbw I have written a min repro with a simple network and in this case the keys actually match! I will take a closer look at my code and get back to you if I believe the issue might still be ColossalAI related.

botbw · 2024-06-15T16:56:29Z

@botbw I have written a min repro with a simple network and in this case the keys actually match! I will take a closer look at my code and get back to you if I believe the issue might still be ColossalAI related.

@ArnaudFickinger Sure, feel free to ask here or raise a new issue

ArnaudFickinger added the bug Something isn't working label Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradients are None after booster.backward #5792

Gradients are None after booster.backward #5792

ArnaudFickinger commented Jun 11, 2024 •

edited

Loading

B-Soul commented Jun 11, 2024

botbw commented Jun 11, 2024

B-Soul commented Jun 11, 2024

botbw commented Jun 11, 2024 •

edited

Loading

ArnaudFickinger commented Jun 11, 2024 •

edited

Loading

botbw commented Jun 12, 2024

ArnaudFickinger commented Jun 15, 2024

botbw commented Jun 15, 2024 •

edited

Loading

ArnaudFickinger commented Jun 15, 2024

botbw commented Jun 15, 2024

Gradients are None after booster.backward #5792

Gradients are None after booster.backward #5792

Comments

ArnaudFickinger commented Jun 11, 2024 • edited Loading

B-Soul commented Jun 11, 2024

botbw commented Jun 11, 2024

B-Soul commented Jun 11, 2024

botbw commented Jun 11, 2024 • edited Loading

ArnaudFickinger commented Jun 11, 2024 • edited Loading

botbw commented Jun 12, 2024

ArnaudFickinger commented Jun 15, 2024

botbw commented Jun 15, 2024 • edited Loading

ArnaudFickinger commented Jun 15, 2024

botbw commented Jun 15, 2024

ArnaudFickinger commented Jun 11, 2024 •

edited

Loading

botbw commented Jun 11, 2024 •

edited

Loading

ArnaudFickinger commented Jun 11, 2024 •

edited

Loading

botbw commented Jun 15, 2024 •

edited

Loading