[QUESTION] questions about Collective Communication Group Initialization Optimization in the paper #40

siddharthaOnRoad · 2024-06-24T15:52:44Z

hi, I'm interested in the Collective Communication Group Initialization part of the paper, which has greatly reduced the initialization time of a training task (from 1047s to under 5s):

It is mentioned in the paper that initialization is slow because of the global barrier after every process group creation. I noticed that from pytorch 2.1, after initializing process group, the store based barrier operation is controlled by TORCH_DIST_INIT_BARRIER environment variable (see pytorch release note). By default this variable is "0", which means by default there is no need to barrier after initializing process group.

My questions are:
(1) After removing the global barrier operation, the communication group initialization in pytorch is still slow? And I wonder the optimization mentioned in the paper still bring huge benefits?
(2) will the source code related to this part be available in the future?

I really appreciate your awesome work. Looking forward to your reply.

The text was updated successfully, but these errors were encountered:

liwenchangbdbz added the question Further information is requested label Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] questions about Collective Communication Group Initialization Optimization in the paper #40

[QUESTION] questions about Collective Communication Group Initialization Optimization in the paper #40

siddharthaOnRoad commented Jun 24, 2024

[QUESTION] questions about Collective Communication Group Initialization Optimization in the paper #40

[QUESTION] questions about Collective Communication Group Initialization Optimization in the paper #40

Comments

siddharthaOnRoad commented Jun 24, 2024