Training details about the t2v model. #63

HashimotoPatrickMu · 2024-06-14T04:48:44Z

Hi, I am currently using one A100 40 doing test on lumina-t2v model, may I ask the gpu type used for training the T2V model. And I also wonder the number of frames?

My implementation follows these steps:

I followed the paper, added another flatten and unflatten operations along the frame dimension.
In order to save time, I did the preprocessing separatedly before starting training, including llama and vae. But the vae is identical to the one used in t2i, so I worry it might not be able to capture enough temporal consistency.

In my testing, the video tensor stops at b=4,f=8,c=4,h=32,w=32 (after embedding) out of the memory issue. So it might be sort of impossible to even do the small-scale tests to verify your temporal-spatial merging method.

I am really interested in reading your training details, and the comparison between temporal-spatial dividing and merging strtegies. Your insights would be greatly helpful.

BurhanUlTayyab · 2024-06-14T06:39:57Z

+1

HashimotoPatrickMu changed the title ~~Hi, may I ask the gpu type that you used for training the T2V model. I also wonder that how many number of frames you wee using?~~ Training details about the t2v model. Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training details about the t2v model. #63

Training details about the t2v model. #63

HashimotoPatrickMu commented Jun 14, 2024 •

edited

Loading

BurhanUlTayyab commented Jun 14, 2024

Training details about the t2v model. #63

Training details about the t2v model. #63

Comments

HashimotoPatrickMu commented Jun 14, 2024 • edited Loading

BurhanUlTayyab commented Jun 14, 2024

HashimotoPatrickMu commented Jun 14, 2024 •

edited

Loading