Skip to content

This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale Public Chinese Video-text Dataset>, which has been accepted by CVPR 2023.

License

Notifications You must be signed in to change notification settings

CNVid/CNVid-3.5M

Repository files navigation

CNVid-3.5M: Build, Filter, and Pre-train the Large-scale Public Chinese Video-text Dataset

This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale Public Chinese Video-text Dataset>, which has been accepted by CVPR 2023.

alt text

News !!!

  • The codebase are now available at AntMMF-CNVid-VTP.
  • Another instruction/project that may be helpful to download original videos in the CNVid-3.5M dataset: MediaCrawler.
  • If you have any questions about CNVid-3.5M, it is recommended to raise your questions at AntMMF project.

IMPORTANT!

Please first check TERMS.md, LEGAL.md, and LICENSE. You must not use the content in this dataset if you do not agree to the terms, legal disclaimer, and license outlined in these files.

We note that we do not own the copyright to any of the collected data. The distribution of identities and activities in the CNVid-3.5M dataset may not be representative of the global human population and the diversity in society. Please be careful of unintended societal, gender, racial, and other biases when training or deploying models trained on this data.

What is CNVid-3.5M?

CNVid-3.5M is a large-scale public cross-modal dataset containing over 3.5 Million Chinese video-text pairs. We summarize our contributions by three verbs, i.e., “Build”, “Filter”, and “Pre-train”: 1) To build a public Chinese video-text dataset, we collect over 4.5M videos from Chinese websites. 2) To improve the data quality, we propose a novel method to filter out 1M weakly-paired videos, resulting in the CNVid-3.5M dataset.

Dataset

Check DATASET.md for instructions of dataset downloading and preprocessing (CNVid-3.5M).

Codebase

The codebase are now available at AntMMF-CNVid-VTP.

Benchmark

Check BENCHMARK.md for instructions of benchmark downloading and model fine-tuning (CNVid-3.5M).

We have already prepared the benchmark, but we still need some time to obtain the external disclosure authorization from our group. All benchmarks are planned to be published in January 2024.

Citation

If you find CNVid-3.5M useful, please consider citing the following paper:

@inproceedings{gan2023cnvid,
  title={CNVid-3.5 M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset},
  author={Gan, Tian and Wang, Qing and Dong, Xingning and Ren, Xiangyuan and Nie, Liqiang and Guo, Qingpei},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={14815--14824},
  year={2023}
}

The word cloud for Top-200 TOPICs in CNVid-3.5M.

alt text

The word cloud for Top-200 KEYWORDs in CNVid-3.5M.

alt text

About

This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale Public Chinese Video-text Dataset>, which has been accepted by CVPR 2023.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages