Vision-Language Web Demo

A chatbot demo with image input.

Supported Models

extract llm model from huggingface model

python extract_xcomposer_llm.py
# the llm part will saved to internlm_model folder.

lanuch the demo

python app.py --model-name internlm-xcomposer-7b --llm-ckpt internlm_model

lanuch the dmeo

python app.py --model-name qwen-vl-chat --hf-ckpt Qwen/Qwen-VL-Chat

this demo uses the code in their repo to extract image features that might not very efficiency.
this demo only contains the chat function. If you want to use localization ability in Qwen-VL-Chat or article generation function in InternLM-XComposer, you need implement these pre/post processes. The difference compared to chat is how to build prompts and use the output of model.