The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
-
Updated
Dec 8, 2023 - Python
The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
an API built on FastAPI for visual question answering. It's open source
Related papers about Referring Image Segmentation (RIS)
My solutions to CS231N CNN assignments
PyTorch code for Finding in NAACL 2022 paper "Probing the Role of Positional Information in Vision-Language Models".
Arabic WordNet matches for synsets in ImageNet
Source code and documentation for the LREC-COLING'24 paper "Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies"
An end-to-end masked contrastive video-and-language pre-training framework
A comprehensive hub for updates on generative AI research, including interviews, notebooks, and additional resources.
[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
Vision-Controllable Natural Language Generation
Counting dataset for Vision & Language models. Introduced in the paper "Seeing Past Words: Testing the Cross-Modal Capabilities of Pretrained V&L Models". https://arxiv.org/abs/2012.12352
VinVL+L: Enriching Visual Representation with Location Context in Visual Question Answering (VQA)
[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.
An end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.
Under review. [IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
Reading group for Vision and Language research
A list of research papers on knowledge-enhanced multimodal learning
Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.
To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."