Training CUDA Out Of Memory error #98

Kev1MSL · 2024-05-30T16:54:12Z

Hi! I am trying to train the instantmesh model but I am currently facing issues just before the backpropagation where I am getting cuda out of memory error. Have you faced a similar issue when training and how did you solve this? I am also training on 8 GPUs with same memory as H800, as explained in the paper.
Thanks!

sumanttyagi · 2024-05-31T03:23:22Z

Please check your cuda devices .

gaodalii · 2024-05-31T04:54:20Z

I am using a single A800(80G), but I can only train it with batch_size=1, if I set batch_size=2, there also would be a cuda out of memory error.

Kev1MSL · 2024-05-31T11:58:33Z

Yes same thing, when I set batch_size=1 it works, but batch_size=2 it does not. However I am only missing a few GB (~2GB), so I was wondering if there is a way to optimize this? And also what happens if I want to distribute the training across multiple gpus, if I set batch_size=1, is it going to be 1 batch per GPU? Or the 1 batch will be distributed across the GPUs?

Because if it is a batch of size 1, then wouldn't we have issue with converging?

Mrguanglei · 2024-05-31T13:36:25Z

@Kev1MSL Hello, I encountered several problems in the training process, the structure of my dataset is as the picture says, but my training profile will not be written, I would like to ask for your help, thank you very much for your reply

throb081 · 2024-06-04T07:08:04Z

@Kev1MSL hello,i am trying to run the training process,but i don't know how to construct the dataset ,can i have a look at the structure of dataset?thank you very much for your reply

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training CUDA Out Of Memory error #98

Training CUDA Out Of Memory error #98

Kev1MSL commented May 30, 2024 •

edited

Loading

sumanttyagi commented May 31, 2024

gaodalii commented May 31, 2024

Kev1MSL commented May 31, 2024 •

edited

Loading

Mrguanglei commented May 31, 2024

throb081 commented Jun 4, 2024

Training CUDA Out Of Memory error #98

Training CUDA Out Of Memory error #98

Comments

Kev1MSL commented May 30, 2024 • edited Loading

sumanttyagi commented May 31, 2024

gaodalii commented May 31, 2024

Kev1MSL commented May 31, 2024 • edited Loading

Mrguanglei commented May 31, 2024

throb081 commented Jun 4, 2024

Kev1MSL commented May 30, 2024 •

edited

Loading

Kev1MSL commented May 31, 2024 •

edited

Loading