Skip to content

An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Notifications You must be signed in to change notification settings

ShareGPT4Omni/ShareGPT4Video

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

⭐️ Our series works: [MMStar] [ShareGPT4V] [ShareGPT4Omni]


🚀🚀🚀 Official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.

Here is a video for introducing ShareGPT4Video clearly:

demo_clip_v2.mp4

💡 Highlights

  • 🔥 A large-scale highly descriptive video-text dataset, 40K GPT4-Vision-generated video captions, around 400K implicit video split captions.
  • 🔥 A general video captioner for various video durations, resolutions, and aspect ratios, approaching GPT4-Vision's caption capability, featuring two inference modes targeted for quality and efficiency, separately.
  • 🔥 A superior large video-language model ShareGPT4Video-8B, lasting 5 hours on 8xA100 GPUs of training respectively.
  • 🔥 Improving Text-to-Video performance with high-quality video captions generated by our ShareCaptioner-Video. Thanks to Open-Sora-Plan.

📜 News

[2024/6/11] The web demo and local demo of ShareCaptioner-Video are available now!

[2024/6/11] The web demo and local demo of ShareGPT4Video-8B are available now!

[2024/6/7] Our paper has been featured as HuggingFace Daily Papers and ranked 1st in 6.7.

[2024/5/27] The ShareGPT4Video-8B model is released!

[2024/5/26] The ShareGPT4Video dataset and project page are released!

👨‍💻 Todo

  • Training and evaluation code for ShareGPT4Video-8B
  • Batch inference code fro ShareCaptioner-Video
  • Web demo and local demo of ShareCaptioner-Video
  • Web demo and local demo of ShareGPT4Video-8B
  • Checkpoints of ShareGPT4Video-8B

Quick Usage

You can directly use our ShareGPT4Video model for conversation with your own video by the following command:

python run.py --model-path Lin-Chen/sharegpt4video-8b --video examples/yoga.mp4 --query Describe this video in detail.

Or you can build your local demo for enjoying our ShareGPT4Video-8B with the following command:

python app.py

You can build your local demo for enjoying our ShareCaptioner-Video with the following command:

cd captioner

python app.py

Install

git clone https://github.com/ShareGPT4Omni/ShareGPT4Video
conda create -n share4video python=3.10 -y
conda activate share4video

cd ShareGPT4Video
pip install --upgrade pip
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

✒️ Citation

If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝

@article{chen2024sharegpt4video,
  title={ShareGPT4Video: Improving Video Understanding and Generation with Better Captions},
  author={Chen, Lin and Wei, Xilin and Li, Jinsong and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Chen, Zehui and Duan, Haodong and Lin, Bin and Tang, Zhenyu and others},
  journal={arXiv preprint arXiv:2406.04325},
  year={2024}
}

❤️ Acknowledgments

  • LLaVA: the codebase we built upon. Thanks for their wonderful work.
  • Open-Sora-Plan: an excellent open-source codebase for Sora-like text-to-video implementation. Thanks for their wonderful work.
  • Open-LLaVA-NeXT: an open-source codebase for re-producing the training procedure of LLaVA-NeXT series.

Star History

Star History Chart