Skip to content

Latest commit

 

History

History
501 lines (493 loc) · 120 KB

Supported-models-datasets.md

File metadata and controls

501 lines (493 loc) · 120 KB

Supported models and datasets

Table of Contents

Models

The table below introcudes all models supported by SWIFT:

  • Model List: The model_type information registered in SWIFT.
  • Default Lora Target Modules: Default lora_target_modules used by the model.
  • Default Template: Default template used by the model.
  • Support Flash Attn: Whether the model supports flash attention to accelerate sft and infer.
  • Support VLLM: Whether the model supports vllm to accelerate infer and deployment.
  • Requires: The extra requirements used by the model.

LLM

Model Type Model ID Default Lora Target Modules Default Template Support Flash Attn Support VLLM Requires Tags HF Model ID
qwen-1_8b qwen/Qwen-1_8B c_attn default-generation - Qwen/Qwen-1_8B
qwen-1_8b-chat qwen/Qwen-1_8B-Chat c_attn qwen - Qwen/Qwen-1_8B-Chat
qwen-1_8b-chat-int4 qwen/Qwen-1_8B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-1_8B-Chat-Int4
qwen-1_8b-chat-int8 qwen/Qwen-1_8B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-1_8B-Chat-Int8
qwen-7b qwen/Qwen-7B c_attn default-generation - Qwen/Qwen-7B
qwen-7b-chat qwen/Qwen-7B-Chat c_attn qwen - Qwen/Qwen-7B-Chat
qwen-7b-chat-int4 qwen/Qwen-7B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-7B-Chat-Int4
qwen-7b-chat-int8 qwen/Qwen-7B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-7B-Chat-Int8
qwen-14b qwen/Qwen-14B c_attn default-generation - Qwen/Qwen-14B
qwen-14b-chat qwen/Qwen-14B-Chat c_attn qwen - Qwen/Qwen-14B-Chat
qwen-14b-chat-int4 qwen/Qwen-14B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-14B-Chat-Int4
qwen-14b-chat-int8 qwen/Qwen-14B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-14B-Chat-Int8
qwen-72b qwen/Qwen-72B c_attn default-generation - Qwen/Qwen-72B
qwen-72b-chat qwen/Qwen-72B-Chat c_attn qwen - Qwen/Qwen-72B-Chat
qwen-72b-chat-int4 qwen/Qwen-72B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-72B-Chat-Int4
qwen-72b-chat-int8 qwen/Qwen-72B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-72B-Chat-Int8
modelscope-agent-7b iic/ModelScope-Agent-7B c_attn modelscope-agent - -
modelscope-agent-14b iic/ModelScope-Agent-14B c_attn modelscope-agent - -
qwen1half-0_5b qwen/Qwen1.5-0.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-0.5B
qwen1half-1_8b qwen/Qwen1.5-1.8B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-1.8B
qwen1half-4b qwen/Qwen1.5-4B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-4B
qwen1half-7b qwen/Qwen1.5-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-7B
qwen1half-14b qwen/Qwen1.5-14B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-14B
qwen1half-32b qwen/Qwen1.5-32B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-32B
qwen1half-72b qwen/Qwen1.5-72B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-72B
qwen1half-110b qwen/Qwen1.5-110B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-110B
codeqwen1half-7b qwen/CodeQwen1.5-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/CodeQwen1.5-7B
qwen1half-moe-a2_7b qwen/Qwen1.5-MoE-A2.7B q_proj, k_proj, v_proj default-generation transformers>=4.40 - Qwen/Qwen1.5-MoE-A2.7B
qwen1half-0_5b-chat qwen/Qwen1.5-0.5B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-0.5B-Chat
qwen1half-1_8b-chat qwen/Qwen1.5-1.8B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-1.8B-Chat
qwen1half-4b-chat qwen/Qwen1.5-4B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-4B-Chat
qwen1half-7b-chat qwen/Qwen1.5-7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-7B-Chat
qwen1half-14b-chat qwen/Qwen1.5-14B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-14B-Chat
qwen1half-32b-chat qwen/Qwen1.5-32B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-32B-Chat
qwen1half-72b-chat qwen/Qwen1.5-72B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-72B-Chat
qwen1half-110b-chat qwen/Qwen1.5-110B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-110B-Chat
qwen1half-moe-a2_7b-chat qwen/Qwen1.5-MoE-A2.7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.40 - Qwen/Qwen1.5-MoE-A2.7B-Chat
codeqwen1half-7b-chat qwen/CodeQwen1.5-7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/CodeQwen1.5-7B-Chat
qwen1half-0_5b-chat-int4 qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4
qwen1half-1_8b-chat-int4 qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4
qwen1half-4b-chat-int4 qwen/Qwen1.5-4B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-4B-Chat-GPTQ-Int4
qwen1half-7b-chat-int4 qwen/Qwen1.5-7B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-7B-Chat-GPTQ-Int4
qwen1half-14b-chat-int4 qwen/Qwen1.5-14B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-14B-Chat-GPTQ-Int4
qwen1half-32b-chat-int4 qwen/Qwen1.5-32B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-32B-Chat-GPTQ-Int4
qwen1half-72b-chat-int4 qwen/Qwen1.5-72B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-72B-Chat-GPTQ-Int4
qwen1half-110b-chat-int4 qwen/Qwen1.5-110B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-110B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-int8 qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8
qwen1half-1_8b-chat-int8 qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8
qwen1half-4b-chat-int8 qwen/Qwen1.5-4B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-4B-Chat-GPTQ-Int8
qwen1half-7b-chat-int8 qwen/Qwen1.5-7B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-7B-Chat-GPTQ-Int8
qwen1half-14b-chat-int8 qwen/Qwen1.5-14B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-14B-Chat-GPTQ-Int8
qwen1half-72b-chat-int8 qwen/Qwen1.5-72B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-72B-Chat-GPTQ-Int8
qwen1half-moe-a2_7b-chat-int4 qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.40 - Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-awq qwen/Qwen1.5-0.5B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-0.5B-Chat-AWQ
qwen1half-1_8b-chat-awq qwen/Qwen1.5-1.8B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-1.8B-Chat-AWQ
qwen1half-4b-chat-awq qwen/Qwen1.5-4B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-4B-Chat-AWQ
qwen1half-7b-chat-awq qwen/Qwen1.5-7B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-7B-Chat-AWQ
qwen1half-14b-chat-awq qwen/Qwen1.5-14B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-14B-Chat-AWQ
qwen1half-32b-chat-awq qwen/Qwen1.5-32B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-32B-Chat-AWQ
qwen1half-72b-chat-awq qwen/Qwen1.5-72B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-72B-Chat-AWQ
qwen1half-110b-chat-awq qwen/Qwen1.5-110B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-110B-Chat-AWQ
codeqwen1half-7b-chat-awq qwen/CodeQwen1.5-7B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/CodeQwen1.5-7B-Chat-AWQ
qwen2-0_5b qwen/Qwen2-0.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2-0.5B
qwen2-0_5b-instruct qwen/Qwen2-0.5B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-0.5B-Instruct
qwen2-0_5b-instruct-int4 qwen/Qwen2-0.5B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4
qwen2-0_5b-instruct-int8 qwen/Qwen2-0.5B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8
qwen2-0_5b-instruct-awq qwen/Qwen2-0.5B-Instruct-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen2-0.5B-Instruct-AWQ
qwen2-1_5b qwen/Qwen2-1.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2-1.5B
qwen2-1_5b-instruct qwen/Qwen2-1.5B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-1.5B-Instruct
qwen2-1_5b-instruct-int4 qwen/Qwen2-1.5B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4
qwen2-1_5b-instruct-int8 qwen/Qwen2-1.5B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8
qwen2-1_5b-instruct-awq qwen/Qwen2-1.5B-Instruct-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen2-1.5B-Instruct-AWQ
qwen2-7b qwen/Qwen2-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2-7B
qwen2-7b-instruct qwen/Qwen2-7B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-7B-Instruct
qwen2-7b-instruct-int4 qwen/Qwen2-7B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-7B-Instruct-GPTQ-Int4
qwen2-7b-instruct-int8 qwen/Qwen2-7B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-7B-Instruct-GPTQ-Int8
qwen2-7b-instruct-awq qwen/Qwen2-7B-Instruct-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen2-7B-Instruct-AWQ
qwen2-72b qwen/Qwen2-72B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2-72B
qwen2-72b-instruct qwen/Qwen2-72B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-72B-Instruct
qwen2-72b-instruct-int4 qwen/Qwen2-72B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-72B-Instruct-GPTQ-Int4
qwen2-72b-instruct-int8 qwen/Qwen2-72B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-72B-Instruct-GPTQ-Int8
qwen2-72b-instruct-awq qwen/Qwen2-72B-Instruct-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen2-72B-Instruct-AWQ
qwen2-57b-a14b qwen/Qwen2-57B-A14B q_proj, k_proj, v_proj default-generation transformers>=4.40 - Qwen/Qwen2-57B-A14B
qwen2-57b-a14b-instruct qwen/Qwen2-57B-A14B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.40 - Qwen/Qwen2-57B-A14B-Instruct
qwen2-57b-a14b-instruct-int4 qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.40 - Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4
chatglm2-6b ZhipuAI/chatglm2-6b query_key_value chatglm2 - THUDM/chatglm2-6b
chatglm2-6b-32k ZhipuAI/chatglm2-6b-32k query_key_value chatglm2 - THUDM/chatglm2-6b-32k
chatglm3-6b-base ZhipuAI/chatglm3-6b-base query_key_value chatglm-generation - THUDM/chatglm3-6b-base
chatglm3-6b ZhipuAI/chatglm3-6b query_key_value chatglm3 - THUDM/chatglm3-6b
chatglm3-6b-32k ZhipuAI/chatglm3-6b-32k query_key_value chatglm3 - THUDM/chatglm3-6b-32k
chatglm3-6b-128k ZhipuAI/chatglm3-6b-128k query_key_value chatglm3 - THUDM/chatglm3-6b-128k
codegeex2-6b ZhipuAI/codegeex2-6b query_key_value chatglm-generation transformers<4.34 coding THUDM/codegeex2-6b
glm4-9b ZhipuAI/glm-4-9b query_key_value chatglm-generation - THUDM/glm-4-9b
glm4-9b-chat ZhipuAI/glm-4-9b-chat query_key_value chatglm3 - THUDM/glm-4-9b-chat
glm4-9b-chat-1m ZhipuAI/glm-4-9b-chat-1m query_key_value chatglm3 - THUDM/glm-4-9b-chat-1m
llama2-7b modelscope/Llama-2-7b-ms q_proj, k_proj, v_proj default-generation - meta-llama/Llama-2-7b-hf
llama2-7b-chat modelscope/Llama-2-7b-chat-ms q_proj, k_proj, v_proj llama - meta-llama/Llama-2-7b-chat-hf
llama2-13b modelscope/Llama-2-13b-ms q_proj, k_proj, v_proj default-generation - meta-llama/Llama-2-13b-hf
llama2-13b-chat modelscope/Llama-2-13b-chat-ms q_proj, k_proj, v_proj llama - meta-llama/Llama-2-13b-chat-hf
llama2-70b modelscope/Llama-2-70b-ms q_proj, k_proj, v_proj default-generation - meta-llama/Llama-2-70b-hf
llama2-70b-chat modelscope/Llama-2-70b-chat-ms q_proj, k_proj, v_proj llama - meta-llama/Llama-2-70b-chat-hf
llama2-7b-aqlm-2bit-1x16 AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf q_proj, k_proj, v_proj default-generation transformers>=4.38, aqlm, torch>=2.2.0 - ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf
llama3-8b LLM-Research/Meta-Llama-3-8B q_proj, k_proj, v_proj default-generation - meta-llama/Meta-Llama-3-8B
llama3-8b-instruct LLM-Research/Meta-Llama-3-8B-Instruct q_proj, k_proj, v_proj llama3 - meta-llama/Meta-Llama-3-8B-Instruct
llama3-8b-instruct-int4 huangjintao/Meta-Llama-3-8B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj llama3 auto_gptq - study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4
llama3-8b-instruct-int8 huangjintao/Meta-Llama-3-8B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj llama3 auto_gptq - study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8
llama3-8b-instruct-awq huangjintao/Meta-Llama-3-8B-Instruct-AWQ q_proj, k_proj, v_proj llama3 autoawq - study-hjt/Meta-Llama-3-8B-Instruct-AWQ
llama3-70b LLM-Research/Meta-Llama-3-70B q_proj, k_proj, v_proj default-generation - meta-llama/Meta-Llama-3-70B
llama3-70b-instruct LLM-Research/Meta-Llama-3-70B-Instruct q_proj, k_proj, v_proj llama3 - meta-llama/Meta-Llama-3-70B-Instruct
llama3-70b-instruct-int4 huangjintao/Meta-Llama-3-70B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj llama3 auto_gptq - study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4
llama3-70b-instruct-int8 huangjintao/Meta-Llama-3-70b-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj llama3 auto_gptq - study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8
llama3-70b-instruct-awq huangjintao/Meta-Llama-3-70B-Instruct-AWQ q_proj, k_proj, v_proj llama3 autoawq - study-hjt/Meta-Llama-3-70B-Instruct-AWQ
chinese-llama-2-1_3b AI-ModelScope/chinese-llama-2-1.3b q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-1.3b
chinese-llama-2-7b AI-ModelScope/chinese-llama-2-7b q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-7b
chinese-llama-2-7b-16k AI-ModelScope/chinese-llama-2-7b-16k q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-7b-16k
chinese-llama-2-7b-64k AI-ModelScope/chinese-llama-2-7b-64k q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-7b-64k
chinese-llama-2-13b AI-ModelScope/chinese-llama-2-13b q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-13b
chinese-llama-2-13b-16k AI-ModelScope/chinese-llama-2-13b-16k q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-13b-16k
chinese-alpaca-2-1_3b AI-ModelScope/chinese-alpaca-2-1.3b q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-1.3b
chinese-alpaca-2-7b AI-ModelScope/chinese-alpaca-2-7b q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-7b
chinese-alpaca-2-7b-16k AI-ModelScope/chinese-alpaca-2-7b-16k q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-7b-16k
chinese-alpaca-2-7b-64k AI-ModelScope/chinese-alpaca-2-7b-64k q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-7b-64k
chinese-alpaca-2-13b AI-ModelScope/chinese-alpaca-2-13b q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-13b
chinese-alpaca-2-13b-16k AI-ModelScope/chinese-alpaca-2-13b-16k q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-13b-16k
llama-3-chinese-8b ChineseAlpacaGroup/llama-3-chinese-8b q_proj, k_proj, v_proj default-generation - hfl/llama-3-chinese-8b
llama-3-chinese-8b-instruct ChineseAlpacaGroup/llama-3-chinese-8b-instruct q_proj, k_proj, v_proj llama3 - hfl/llama-3-chinese-8b-instruct
atom-7b FlagAlpha/Atom-7B q_proj, k_proj, v_proj default-generation - FlagAlpha/Atom-7B
atom-7b-chat FlagAlpha/Atom-7B-Chat q_proj, k_proj, v_proj atom - FlagAlpha/Atom-7B-Chat
yi-6b 01ai/Yi-6B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-6B
yi-6b-200k 01ai/Yi-6B-200K q_proj, k_proj, v_proj default-generation - 01-ai/Yi-6B-200K
yi-6b-chat 01ai/Yi-6B-Chat q_proj, k_proj, v_proj yi - 01-ai/Yi-6B-Chat
yi-6b-chat-awq 01ai/Yi-6B-Chat-4bits q_proj, k_proj, v_proj yi autoawq - 01-ai/Yi-6B-Chat-4bits
yi-6b-chat-int8 01ai/Yi-6B-Chat-8bits q_proj, k_proj, v_proj yi auto_gptq - 01-ai/Yi-6B-Chat-8bits
yi-9b 01ai/Yi-9B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-9B
yi-9b-200k 01ai/Yi-9B-200K q_proj, k_proj, v_proj default-generation - 01-ai/Yi-9B-200K
yi-34b 01ai/Yi-34B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-34B
yi-34b-200k 01ai/Yi-34B-200K q_proj, k_proj, v_proj default-generation - 01-ai/Yi-34B-200K
yi-34b-chat 01ai/Yi-34B-Chat q_proj, k_proj, v_proj yi - 01-ai/Yi-34B-Chat
yi-34b-chat-awq 01ai/Yi-34B-Chat-4bits q_proj, k_proj, v_proj yi autoawq - 01-ai/Yi-34B-Chat-4bits
yi-34b-chat-int8 01ai/Yi-34B-Chat-8bits q_proj, k_proj, v_proj yi auto_gptq - 01-ai/Yi-34B-Chat-8bits
yi-1_5-6b 01ai/Yi-1.5-6B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-1.5-6B
yi-1_5-6b-chat 01ai/Yi-1.5-6B-Chat q_proj, k_proj, v_proj yi1_5 - 01-ai/Yi-1.5-6B-Chat
yi-1_5-9b 01ai/Yi-1.5-9B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-1.5-9B
yi-1_5-9b-chat 01ai/Yi-1.5-9B-Chat q_proj, k_proj, v_proj yi1_5 - 01-ai/Yi-1.5-9B-Chat
yi-1_5-9b-chat-16k 01ai/Yi-1.5-9B-Chat q_proj, k_proj, v_proj yi1_5 - 01-ai/Yi-1.5-9B-Chat-16K
yi-1_5-34b 01ai/Yi-1.5-34B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-1.5-34B
yi-1_5-34b-chat 01ai/Yi-1.5-34B-Chat q_proj, k_proj, v_proj yi1_5 - 01-ai/Yi-1.5-34B-Chat
yi-1_5-34b-chat-16k 01ai/Yi-1.5-34B-Chat-16K q_proj, k_proj, v_proj yi1_5 - 01-ai/Yi-1.5-34B-Chat-16K
yi-1_5-6b-chat-awq-int4 AI-ModelScope/Yi-1.5-6B-Chat-AWQ q_proj, k_proj, v_proj yi1_5 autoawq - modelscope/Yi-1.5-6B-Chat-AWQ
yi-1_5-6b-chat-gptq-int4 AI-ModelScope/Yi-1.5-6B-Chat-GPTQ q_proj, k_proj, v_proj yi1_5 auto_gptq>=0.5 - modelscope/Yi-1.5-6B-Chat-GPTQ
yi-1_5-9b-chat-awq-int4 AI-ModelScope/Yi-1.5-9B-Chat-AWQ q_proj, k_proj, v_proj yi1_5 autoawq - modelscope/Yi-1.5-9B-Chat-AWQ
yi-1_5-9b-chat-gptq-int4 AI-ModelScope/Yi-1.5-9B-Chat-GPTQ q_proj, k_proj, v_proj yi1_5 auto_gptq>=0.5 - modelscope/Yi-1.5-9B-Chat-GPTQ
yi-1_5-34b-chat-awq-int4 AI-ModelScope/Yi-1.5-34B-Chat-AWQ q_proj, k_proj, v_proj yi1_5 autoawq - modelscope/Yi-1.5-34B-Chat-AWQ
yi-1_5-34b-chat-gptq-int4 AI-ModelScope/Yi-1.5-34B-Chat-GPTQ q_proj, k_proj, v_proj yi1_5 auto_gptq>=0.5 - modelscope/Yi-1.5-34B-Chat-GPTQ
internlm-7b Shanghai_AI_Laboratory/internlm-7b q_proj, k_proj, v_proj default-generation - internlm/internlm-7b
internlm-7b-chat Shanghai_AI_Laboratory/internlm-chat-7b q_proj, k_proj, v_proj internlm - internlm/internlm-chat-7b
internlm-7b-chat-8k Shanghai_AI_Laboratory/internlm-chat-7b-8k q_proj, k_proj, v_proj internlm - -
internlm-20b Shanghai_AI_Laboratory/internlm-20b q_proj, k_proj, v_proj default-generation - internlm/internlm2-20b
internlm-20b-chat Shanghai_AI_Laboratory/internlm-chat-20b q_proj, k_proj, v_proj internlm - internlm/internlm2-chat-20b
internlm2-1_8b Shanghai_AI_Laboratory/internlm2-1_8b wqkv default-generation transformers>=4.35 - internlm/internlm2-1_8b
internlm2-1_8b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft wqkv internlm2 transformers>=4.35 - internlm/internlm2-chat-1_8b-sft
internlm2-1_8b-chat Shanghai_AI_Laboratory/internlm2-chat-1_8b wqkv internlm2 transformers>=4.35 - internlm/internlm2-chat-1_8b
internlm2-7b-base Shanghai_AI_Laboratory/internlm2-base-7b wqkv default-generation transformers>=4.35 - internlm/internlm2-base-7b
internlm2-7b Shanghai_AI_Laboratory/internlm2-7b wqkv default-generation transformers>=4.35 - internlm/internlm2-7b
internlm2-7b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-7b-sft wqkv internlm2 transformers>=4.35 - internlm/internlm2-chat-7b-sft
internlm2-7b-chat Shanghai_AI_Laboratory/internlm2-chat-7b wqkv internlm2 transformers>=4.35 - internlm/internlm2-chat-7b
internlm2-20b-base Shanghai_AI_Laboratory/internlm2-base-20b wqkv default-generation transformers>=4.35 - internlm/internlm2-base-20b
internlm2-20b Shanghai_AI_Laboratory/internlm2-20b wqkv default-generation transformers>=4.35 - internlm/internlm2-20b
internlm2-20b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-20b-sft wqkv internlm2 transformers>=4.35 - internlm/internlm2-chat-20b-sft
internlm2-20b-chat Shanghai_AI_Laboratory/internlm2-chat-20b wqkv internlm2 transformers>=4.35 - internlm/internlm2-chat-20b
internlm2-math-7b Shanghai_AI_Laboratory/internlm2-math-base-7b wqkv default-generation transformers>=4.35 math internlm/internlm2-math-base-7b
internlm2-math-7b-chat Shanghai_AI_Laboratory/internlm2-math-7b wqkv internlm2 transformers>=4.35 math internlm/internlm2-math-7b
internlm2-math-20b Shanghai_AI_Laboratory/internlm2-math-base-20b wqkv default-generation transformers>=4.35 math internlm/internlm2-math-base-20b
internlm2-math-20b-chat Shanghai_AI_Laboratory/internlm2-math-20b wqkv internlm2 transformers>=4.35 math internlm/internlm2-math-20b
deepseek-7b deepseek-ai/deepseek-llm-7b-base q_proj, k_proj, v_proj default-generation - deepseek-ai/deepseek-llm-7b-base
deepseek-7b-chat deepseek-ai/deepseek-llm-7b-chat q_proj, k_proj, v_proj deepseek - deepseek-ai/deepseek-llm-7b-chat
deepseek-moe-16b deepseek-ai/deepseek-moe-16b-base q_proj, k_proj, v_proj default-generation - deepseek-ai/deepseek-moe-16b-base
deepseek-moe-16b-chat deepseek-ai/deepseek-moe-16b-chat q_proj, k_proj, v_proj deepseek - deepseek-ai/deepseek-moe-16b-chat
deepseek-67b deepseek-ai/deepseek-llm-67b-base q_proj, k_proj, v_proj default-generation - deepseek-ai/deepseek-llm-67b-base
deepseek-67b-chat deepseek-ai/deepseek-llm-67b-chat q_proj, k_proj, v_proj deepseek - deepseek-ai/deepseek-llm-67b-chat
deepseek-coder-1_3b deepseek-ai/deepseek-coder-1.3b-base q_proj, k_proj, v_proj default-generation coding deepseek-ai/deepseek-coder-1.3b-base
deepseek-coder-1_3b-instruct deepseek-ai/deepseek-coder-1.3b-instruct q_proj, k_proj, v_proj deepseek-coder coding deepseek-ai/deepseek-coder-1.3b-instruct
deepseek-coder-6_7b deepseek-ai/deepseek-coder-6.7b-base q_proj, k_proj, v_proj default-generation coding deepseek-ai/deepseek-coder-6.7b-base
deepseek-coder-6_7b-instruct deepseek-ai/deepseek-coder-6.7b-instruct q_proj, k_proj, v_proj deepseek-coder coding deepseek-ai/deepseek-coder-6.7b-instruct
deepseek-coder-33b deepseek-ai/deepseek-coder-33b-base q_proj, k_proj, v_proj default-generation coding deepseek-ai/deepseek-coder-33b-base
deepseek-coder-33b-instruct deepseek-ai/deepseek-coder-33b-instruct q_proj, k_proj, v_proj deepseek-coder coding deepseek-ai/deepseek-coder-33b-instruct
deepseek-coder-v2-instruct deepseek-ai/DeepSeek-Coder-V2-Instruct q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2 transformers>=4.39.3 coding deepseek-ai/DeepSeek-Coder-V2-Instruct
deepseek-coder-v2-lite-instruct deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2 transformers>=4.39.3 coding deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
deepseek-math-7b deepseek-ai/deepseek-math-7b-base q_proj, k_proj, v_proj default-generation math deepseek-ai/deepseek-math-7b-base
deepseek-math-7b-instruct deepseek-ai/deepseek-math-7b-instruct q_proj, k_proj, v_proj deepseek math deepseek-ai/deepseek-math-7b-instruct
deepseek-math-7b-chat deepseek-ai/deepseek-math-7b-rl q_proj, k_proj, v_proj deepseek math deepseek-ai/deepseek-math-7b-rl
deepseek-v2-chat deepseek-ai/DeepSeek-V2-Chat q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2 transformers>=4.39.3 - deepseek-ai/DeepSeek-V2-Chat
deepseek-v2-lite deepseek-ai/DeepSeek-V2-Lite q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj default-generation transformers>=4.39.3 - deepseek-ai/DeepSeek-V2-Lite
deepseek-v2-lite-chat deepseek-ai/DeepSeek-V2-Lite-Chat q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2 transformers>=4.39.3 - deepseek-ai/DeepSeek-V2-Lite-Chat
gemma-2b AI-ModelScope/gemma-2b q_proj, k_proj, v_proj default-generation transformers>=4.38 - google/gemma-2b
gemma-7b AI-ModelScope/gemma-7b q_proj, k_proj, v_proj default-generation transformers>=4.38 - google/gemma-7b
gemma-2b-instruct AI-ModelScope/gemma-2b-it q_proj, k_proj, v_proj gemma transformers>=4.38 - google/gemma-2b-it
gemma-7b-instruct AI-ModelScope/gemma-7b-it q_proj, k_proj, v_proj gemma transformers>=4.38 - google/gemma-7b-it
minicpm-1b-sft-chat OpenBMB/MiniCPM-1B-sft-bf16 q_proj, k_proj, v_proj minicpm transformers>=4.36.0 - openbmb/MiniCPM-1B-sft-bf16
minicpm-2b-sft-chat OpenBMB/MiniCPM-2B-sft-fp32 q_proj, k_proj, v_proj minicpm - openbmb/MiniCPM-2B-sft-fp32
minicpm-2b-chat OpenBMB/MiniCPM-2B-dpo-fp32 q_proj, k_proj, v_proj minicpm - openbmb/MiniCPM-2B-dpo-fp32
minicpm-2b-128k OpenBMB/MiniCPM-2B-128k q_proj, k_proj, v_proj chatml transformers>=4.36.0 - openbmb/MiniCPM-2B-128k
minicpm-moe-8x2b OpenBMB/MiniCPM-MoE-8x2B q_proj, k_proj, v_proj minicpm transformers>=4.36.0 - openbmb/MiniCPM-MoE-8x2B
openbuddy-llama-65b-chat OpenBuddy/openbuddy-llama-65b-v8-bf16 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-llama-65b-v8-bf16
openbuddy-llama2-13b-chat OpenBuddy/openbuddy-llama2-13b-v8.1-fp16 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-llama2-13b-v8.1-fp16
openbuddy-llama2-70b-chat OpenBuddy/openbuddy-llama2-70b-v10.1-bf16 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-llama2-70b-v10.1-bf16
openbuddy-llama3-8b-chat OpenBuddy/openbuddy-llama3-8b-v21.1-8k q_proj, k_proj, v_proj openbuddy2 - OpenBuddy/openbuddy-llama3-8b-v21.1-8k
openbuddy-llama3-70b-chat OpenBuddy/openbuddy-llama3-70b-v21.1-8k q_proj, k_proj, v_proj openbuddy2 - OpenBuddy/openbuddy-llama3-70b-v21.1-8k
openbuddy-mistral-7b-chat OpenBuddy/openbuddy-mistral-7b-v17.1-32k q_proj, k_proj, v_proj openbuddy transformers>=4.34 - OpenBuddy/openbuddy-mistral-7b-v17.1-32k
openbuddy-zephyr-7b-chat OpenBuddy/openbuddy-zephyr-7b-v14.1 q_proj, k_proj, v_proj openbuddy transformers>=4.34 - OpenBuddy/openbuddy-zephyr-7b-v14.1
openbuddy-deepseek-67b-chat OpenBuddy/openbuddy-deepseek-67b-v15.2 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-deepseek-67b-v15.2
openbuddy-mixtral-moe-7b-chat OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k q_proj, k_proj, v_proj openbuddy transformers>=4.36 - OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k
mistral-7b AI-ModelScope/Mistral-7B-v0.1 q_proj, k_proj, v_proj default-generation transformers>=4.34 - mistralai/Mistral-7B-v0.1
mistral-7b-v2 AI-ModelScope/Mistral-7B-v0.2-hf q_proj, k_proj, v_proj default-generation transformers>=4.34 - alpindale/Mistral-7B-v0.2-hf
mistral-7b-instruct AI-ModelScope/Mistral-7B-Instruct-v0.1 q_proj, k_proj, v_proj llama transformers>=4.34 - mistralai/Mistral-7B-Instruct-v0.1
mistral-7b-instruct-v2 AI-ModelScope/Mistral-7B-Instruct-v0.2 q_proj, k_proj, v_proj llama transformers>=4.34 - mistralai/Mistral-7B-Instruct-v0.2
mixtral-moe-7b AI-ModelScope/Mixtral-8x7B-v0.1 q_proj, k_proj, v_proj default-generation transformers>=4.36 - mistralai/Mixtral-8x7B-v0.1
mixtral-moe-7b-instruct AI-ModelScope/Mixtral-8x7B-Instruct-v0.1 q_proj, k_proj, v_proj llama transformers>=4.36 - mistralai/Mixtral-8x7B-Instruct-v0.1
mixtral-moe-7b-aqlm-2bit-1x16 AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf q_proj, k_proj, v_proj default-generation transformers>=4.38, aqlm, torch>=2.2.0 - ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf
mixtral-moe-8x22b-v1 AI-ModelScope/Mixtral-8x22B-v0.1 q_proj, k_proj, v_proj default-generation transformers>=4.36 - mistral-community/Mixtral-8x22B-v0.1
wizardlm2-7b-awq AI-ModelScope/WizardLM-2-7B-AWQ q_proj, k_proj, v_proj wizardlm2-awq transformers>=4.34 - MaziyarPanahi/WizardLM-2-7B-AWQ
wizardlm2-8x22b AI-ModelScope/WizardLM-2-8x22B q_proj, k_proj, v_proj wizardlm2 transformers>=4.36 - alpindale/WizardLM-2-8x22B
baichuan-7b baichuan-inc/baichuan-7B W_pack default-generation transformers<4.34 - baichuan-inc/Baichuan-7B
baichuan-13b baichuan-inc/Baichuan-13B-Base W_pack default-generation transformers<4.34 - baichuan-inc/Baichuan-13B-Base
baichuan-13b-chat baichuan-inc/Baichuan-13B-Chat W_pack baichuan transformers<4.34 - baichuan-inc/Baichuan-13B-Chat
baichuan2-7b baichuan-inc/Baichuan2-7B-Base W_pack default-generation - baichuan-inc/Baichuan2-7B-Base
baichuan2-7b-chat baichuan-inc/Baichuan2-7B-Chat W_pack baichuan - baichuan-inc/Baichuan2-7B-Chat
baichuan2-7b-chat-int4 baichuan-inc/Baichuan2-7B-Chat-4bits W_pack baichuan bitsandbytes<0.41.2, accelerate<0.26 - baichuan-inc/Baichuan2-7B-Chat-4bits
baichuan2-13b baichuan-inc/Baichuan2-13B-Base W_pack default-generation - baichuan-inc/Baichuan2-13B-Base
baichuan2-13b-chat baichuan-inc/Baichuan2-13B-Chat W_pack baichuan - baichuan-inc/Baichuan2-13B-Chat
baichuan2-13b-chat-int4 baichuan-inc/Baichuan2-13B-Chat-4bits W_pack baichuan bitsandbytes<0.41.2, accelerate<0.26 - baichuan-inc/Baichuan2-13B-Chat-4bits
yuan2-2b-instruct YuanLLM/Yuan2.0-2B-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-2B-hf
yuan2-2b-janus-instruct YuanLLM/Yuan2-2B-Janus-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-2B-Janus-hf
yuan2-51b-instruct YuanLLM/Yuan2.0-51B-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-51B-hf
yuan2-102b-instruct YuanLLM/Yuan2.0-102B-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-102B-hf
yuan2-m32 YuanLLM/Yuan2-M32-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-M32-hf
xverse-7b xverse/XVERSE-7B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-7B
xverse-7b-chat xverse/XVERSE-7B-Chat q_proj, k_proj, v_proj xverse - xverse/XVERSE-7B-Chat
xverse-13b xverse/XVERSE-13B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-13B
xverse-13b-chat xverse/XVERSE-13B-Chat q_proj, k_proj, v_proj xverse - xverse/XVERSE-13B-Chat
xverse-65b xverse/XVERSE-65B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-65B
xverse-65b-v2 xverse/XVERSE-65B-2 q_proj, k_proj, v_proj default-generation - xverse/XVERSE-65B-2
xverse-65b-chat xverse/XVERSE-65B-Chat q_proj, k_proj, v_proj xverse - xverse/XVERSE-65B-Chat
xverse-13b-256k xverse/XVERSE-13B-256K q_proj, k_proj, v_proj default-generation - xverse/XVERSE-13B-256K
xverse-moe-a4_2b xverse/XVERSE-MoE-A4.2B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-MoE-A4.2B
orion-14b OrionStarAI/Orion-14B-Base q_proj, k_proj, v_proj default-generation - OrionStarAI/Orion-14B-Base
orion-14b-chat OrionStarAI/Orion-14B-Chat q_proj, k_proj, v_proj orion - OrionStarAI/Orion-14B-Chat
bluelm-7b vivo-ai/BlueLM-7B-Base q_proj, k_proj, v_proj default-generation - vivo-ai/BlueLM-7B-Base
bluelm-7b-32k vivo-ai/BlueLM-7B-Base-32K q_proj, k_proj, v_proj default-generation - vivo-ai/BlueLM-7B-Base-32K
bluelm-7b-chat vivo-ai/BlueLM-7B-Chat q_proj, k_proj, v_proj bluelm - vivo-ai/BlueLM-7B-Chat
bluelm-7b-chat-32k vivo-ai/BlueLM-7B-Chat-32K q_proj, k_proj, v_proj bluelm - vivo-ai/BlueLM-7B-Chat-32K
ziya2-13b Fengshenbang/Ziya2-13B-Base q_proj, k_proj, v_proj default-generation - IDEA-CCNL/Ziya2-13B-Base
ziya2-13b-chat Fengshenbang/Ziya2-13B-Chat q_proj, k_proj, v_proj ziya - IDEA-CCNL/Ziya2-13B-Chat
skywork-13b skywork/Skywork-13B-base q_proj, k_proj, v_proj default-generation - Skywork/Skywork-13B-base
skywork-13b-chat skywork/Skywork-13B-chat q_proj, k_proj, v_proj skywork - -
zephyr-7b-beta-chat modelscope/zephyr-7b-beta q_proj, k_proj, v_proj zephyr transformers>=4.34 - HuggingFaceH4/zephyr-7b-beta
polylm-13b damo/nlp_polylm_13b_text_generation c_attn default-generation - DAMO-NLP-MT/polylm-13b
seqgpt-560m damo/nlp_seqgpt-560m query_key_value default-generation - DAMO-NLP/SeqGPT-560M
sus-34b-chat SUSTC/SUS-Chat-34B q_proj, k_proj, v_proj sus - SUSTech/SUS-Chat-34B
tongyi-finance-14b TongyiFinance/Tongyi-Finance-14B c_attn default-generation financial -
tongyi-finance-14b-chat TongyiFinance/Tongyi-Finance-14B-Chat c_attn qwen financial jxy/Tongyi-Finance-14B-Chat
tongyi-finance-14b-chat-int4 TongyiFinance/Tongyi-Finance-14B-Chat-Int4 c_attn qwen auto_gptq>=0.5 financial jxy/Tongyi-Finance-14B-Chat-Int4
codefuse-codellama-34b-chat codefuse-ai/CodeFuse-CodeLlama-34B q_proj, k_proj, v_proj codefuse-codellama coding codefuse-ai/CodeFuse-CodeLlama-34B
codefuse-codegeex2-6b-chat codefuse-ai/CodeFuse-CodeGeeX2-6B query_key_value codefuse transformers<4.34 coding codefuse-ai/CodeFuse-CodeGeeX2-6B
codefuse-qwen-14b-chat codefuse-ai/CodeFuse-QWen-14B c_attn codefuse coding codefuse-ai/CodeFuse-QWen-14B
phi2-3b AI-ModelScope/phi-2 Wqkv default-generation coding microsoft/phi-2
phi3-4b-4k-instruct LLM-Research/Phi-3-mini-4k-instruct qkv_proj phi3 transformers>=4.36 general microsoft/Phi-3-mini-4k-instruct
phi3-4b-128k-instruct LLM-Research/Phi-3-mini-128k-instruct qkv_proj phi3 transformers>=4.36 general microsoft/Phi-3-mini-128k-instruct
phi3-small-128k-instruct LLM-Research/Phi-3-small-128k-instruct query_key_value phi3 transformers>=4.36 general microsoft/Phi-3-small-128k-instruct
phi3-medium-128k-instruct LLM-Research/Phi-3-medium-128k-instruct qkv_proj phi3 transformers>=4.36 general microsoft/Phi-3-medium-128k-instruct
mamba-130m AI-ModelScope/mamba-130m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-130m-hf
mamba-370m AI-ModelScope/mamba-370m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-370m-hf
mamba-390m AI-ModelScope/mamba-390m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-390m-hf
mamba-790m AI-ModelScope/mamba-790m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-790m-hf
mamba-1.4b AI-ModelScope/mamba-1.4b-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-1.4b-hf
mamba-2.8b AI-ModelScope/mamba-2.8b-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-2.8b-hf
telechat-7b TeleAI/TeleChat-7B key_value, query telechat - Tele-AI/telechat-7B
telechat-12b TeleAI/TeleChat-12B key_value, query telechat - Tele-AI/TeleChat-12B
telechat-12b-v2 TeleAI/TeleChat-12B-v2 key_value, query telechat-v2 - Tele-AI/TeleChat-12B-v2
telechat-12b-v2-gptq-int4 swift/TeleChat-12B-V2-GPTQ-Int4 key_value, query telechat-v2 auto_gptq>=0.5 - -
grok-1 colossalai/grok-1-pytorch q_proj, k_proj, v_proj default-generation - hpcai-tech/grok-1
dbrx-instruct AI-ModelScope/dbrx-instruct attn.Wqkv dbrx transformers>=4.36 - databricks/dbrx-instruct
dbrx-base AI-ModelScope/dbrx-base attn.Wqkv dbrx transformers>=4.36 - databricks/dbrx-base
mengzi3-13b-base langboat/Mengzi3-13B-Base q_proj, k_proj, v_proj mengzi - Langboat/Mengzi3-13B-Base
c4ai-command-r-v01 AI-ModelScope/c4ai-command-r-v01 q_proj, k_proj, v_proj c4ai transformers>=4.39.1 - CohereForAI/c4ai-command-r-v01
c4ai-command-r-plus AI-ModelScope/c4ai-command-r-plus q_proj, k_proj, v_proj c4ai transformers>4.39 - CohereForAI/c4ai-command-r-plus
codestral-22b huangjintao/Codestral-22B-v0.1 q_proj, k_proj, v_proj default-generation transformers>=4.34 - mistralai/Codestral-22B-v0.1

MLLM

Model Type Model ID Default Lora Target Modules Default Template Support Flash Attn Support VLLM Requires Tags HF Model ID
qwen-vl qwen/Qwen-VL c_attn default-generation vision Qwen/Qwen-VL
qwen-vl-chat qwen/Qwen-VL-Chat c_attn qwenvl vision Qwen/Qwen-VL-Chat
qwen-vl-chat-int4 qwen/Qwen-VL-Chat-Int4 c_attn qwen auto_gptq>=0.5 vision Qwen/Qwen-VL-Chat-Int4
qwen-audio qwen/Qwen-Audio c_attn qwen-audio-generation audio Qwen/Qwen-Audio
qwen-audio-chat qwen/Qwen-Audio-Chat c_attn qwen-audio audio Qwen/Qwen-Audio-Chat
glm4v-9b-chat ZhipuAI/glm-4v-9b self_attention.query_key_value glm4v vision THUDM/glm-4v-9b
llava1_5-7b-chat huangjintao/llava-1.5-7b-hf q_proj, k_proj, v_proj llava1_5 transformers>=4.36 vision llava-hf/llava-1.5-7b-hf
llava1_6-mistral-7b-instruct AI-ModelScope/llava-v1.6-mistral-7b q_proj, k_proj, v_proj llava-mistral-instruct transformers>=4.34 vision liuhaotian/llava-v1.6-mistral-7b
llava1_6-yi-34b-instruct AI-ModelScope/llava-v1.6-34b q_proj, k_proj, v_proj llava-yi-instruct vision liuhaotian/llava-v1.6-34b
llama3-llava-next-8b AI-Modelscope/llama3-llava-next-8b q_proj, k_proj, v_proj llama-llava-next vision lmms-lab/llama3-llava-next-8b
llava-next-72b AI-Modelscope/llava-next-72b q_proj, k_proj, v_proj llava-qwen-instruct vision lmms-lab/llava-next-72b
llava-next-110b AI-Modelscope/llava-next-110b q_proj, k_proj, v_proj llava-qwen-instruct vision lmms-lab/llava-next-110b
yi-vl-6b-chat 01ai/Yi-VL-6B q_proj, k_proj, v_proj yi-vl transformers>=4.34 vision 01-ai/Yi-VL-6B
yi-vl-34b-chat 01ai/Yi-VL-34B q_proj, k_proj, v_proj yi-vl transformers>=4.34 vision 01-ai/Yi-VL-34B
llava-llama-3-8b-v1_1 AI-ModelScope/llava-llama-3-8b-v1_1-transformers q_proj, k_proj, v_proj llava-llama-instruct transformers>=4.36 vision xtuner/llava-llama-3-8b-v1_1-transformers
internlm-xcomposer2-7b-chat Shanghai_AI_Laboratory/internlm-xcomposer2-7b wqkv internlm-xcomposer2 vision internlm/internlm-xcomposer2-7b
internvl-chat-v1_5 AI-ModelScope/InternVL-Chat-V1-5 wqkv internvl transformers>=4.35, timm vision OpenGVLab/InternVL-Chat-V1-5
internvl-chat-v1_5-int8 AI-ModelScope/InternVL-Chat-V1-5-int8 wqkv internvl transformers>=4.35, timm vision OpenGVLab/InternVL-Chat-V1-5-int8
mini-internvl-chat-2b-v1_5 OpenGVLab/Mini-InternVL-Chat-2B-V1-5 wqkv internvl transformers>=4.35, timm vision OpenGVLab/Mini-InternVL-Chat-2B-V1-5
mini-internvl-chat-4b-v1_5 OpenGVLab/Mini-InternVL-Chat-4B-V1-5 qkv_proj internvl-phi3 transformers>=4.35, timm vision OpenGVLab/Mini-InternVL-Chat-4B-V1-5
deepseek-vl-1_3b-chat deepseek-ai/deepseek-vl-1.3b-chat q_proj, k_proj, v_proj deepseek-vl attrdict vision deepseek-ai/deepseek-vl-1.3b-chat
deepseek-vl-7b-chat deepseek-ai/deepseek-vl-7b-chat q_proj, k_proj, v_proj deepseek-vl attrdict vision deepseek-ai/deepseek-vl-7b-chat
paligemma-3b-pt-224 AI-ModelScope/paligemma-3b-pt-224 q_proj, k_proj, v_proj paligemma transformers>=4.41 vision google/paligemma-3b-pt-224
paligemma-3b-pt-448 AI-ModelScope/paligemma-3b-pt-448 q_proj, k_proj, v_proj paligemma transformers>=4.41 vision google/paligemma-3b-pt-448
paligemma-3b-pt-896 AI-ModelScope/paligemma-3b-pt-896 q_proj, k_proj, v_proj paligemma transformers>=4.41 vision google/paligemma-3b-pt-896
paligemma-3b-mix-224 AI-ModelScope/paligemma-3b-mix-224 q_proj, k_proj, v_proj paligemma transformers>=4.41 vision google/paligemma-3b-mix-224
paligemma-3b-mix-448 AI-ModelScope/paligemma-3b-mix-448 q_proj, k_proj, v_proj paligemma transformers>=4.41 vision google/paligemma-3b-mix-448
minicpm-v-3b-chat OpenBMB/MiniCPM-V q_proj, k_proj, v_proj minicpm-v vision openbmb/MiniCPM-V
minicpm-v-v2-chat OpenBMB/MiniCPM-V-2 q_proj, k_proj, v_proj minicpm-v timm vision openbmb/MiniCPM-V-2
minicpm-v-v2_5-chat OpenBMB/MiniCPM-Llama3-V-2_5 q_proj, k_proj, v_proj minicpm-v-v2_5 timm vision openbmb/MiniCPM-Llama3-V-2_5
mplug-owl2-chat iic/mPLUG-Owl2 q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1 mplug-owl2 transformers<4.35, icecream vision MAGAer13/mplug-owl2-llama2-7b
mplug-owl2_1-chat iic/mPLUG-Owl2.1 c_attn.multiway.0, c_attn.multiway.1 mplug-owl2 transformers<4.35, icecream vision Mizukiluke/mplug_owl_2_1
phi3-vision-128k-instruct LLM-Research/Phi-3-vision-128k-instruct qkv_proj phi3-vl transformers>=4.36 vision microsoft/Phi-3-vision-128k-instruct
cogvlm-17b-chat ZhipuAI/cogvlm-chat vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense cogvlm vision THUDM/cogvlm-chat-hf
cogvlm2-19b-chat ZhipuAI/cogvlm2-llama3-chinese-chat-19B vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense cogvlm vision THUDM/cogvlm2-llama3-chinese-chat-19B
cogvlm2-en-19b-chat ZhipuAI/cogvlm2-llama3-chat-19B vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense cogvlm vision THUDM/cogvlm2-llama3-chat-19B
cogagent-18b-chat ZhipuAI/cogagent-chat vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense cogagent-chat timm vision THUDM/cogagent-chat-hf
cogagent-18b-instruct ZhipuAI/cogagent-vqa vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense cogagent-instruct timm vision THUDM/cogagent-vqa-hf

Datasets

The table below introduces the datasets supported by SWIFT:

  • Dataset Name: The dataset name registered in SWIFT.
  • Dataset ID: The dataset id in ModelScope.
  • Size: The data row count of the dataset.
  • Statistic: Dataset statistics. We use the number of tokens for statistics, which helps adjust the max_length hyperparameter. We concatenate the training and validation sets of the dataset and then compute the statistics. We use qwen's tokenizer to tokenize the dataset. Different tokenizers produce different statistics. If you want to obtain token statistics for tokenizers of other models, you can use the script to get them yourself.
Dataset Name Dataset ID Subsets Dataset Size Statistic (token) Tags HF Dataset ID
🔥ms-bench iic/ms_bench 316820 346.9±443.2, min=22, max=30960 chat, general, multi-round -
🔥alpaca-en AI-ModelScope/alpaca-gpt4-data-en 52002 176.2±125.8, min=26, max=740 chat, general vicgalle/alpaca-gpt4
🔥alpaca-zh AI-ModelScope/alpaca-gpt4-data-zh 48818 162.1±93.9, min=26, max=856 chat, general llm-wizard/alpaca-gpt4-data-zh
multi-alpaca damo/nlp_polylm_multialpaca_sft ar
de
es
fr
id
ja
ko
pt
ru
th
vi
131867 112.9±50.6, min=26, max=1226 chat, general, multilingual -
instinwild wyj123456/instinwild default
subset
103695 145.4±60.7, min=28, max=1434 - -
cot-en YorickHe/CoT 74771 122.7±64.8, min=51, max=8320 chat, general -
cot-zh YorickHe/CoT_zh 74771 117.5±70.8, min=43, max=9636 chat, general -
instruct-en wyj123456/instruct 888970 269.1±331.5, min=26, max=7254 chat, general -
firefly-zh AI-ModelScope/firefly-train-1.1M 1649399 178.1±260.4, min=26, max=12516 chat, general YeungNLP/firefly-train-1.1M
gpt4all-en wyj123456/GPT4all 806199 302.7±384.5, min=27, max=7391 chat, general -
sharegpt huangjintao/sharegpt common-zh
computer-zh
unknow-zh
common-en
computer-en
96566 933.3±864.8, min=21, max=66412 chat, general, multi-round -
tulu-v2-sft-mixture AI-ModelScope/tulu-v2-sft-mixture 5119 520.7±437.6, min=68, max=2549 chat, multilingual, general, multi-round allenai/tulu-v2-sft-mixture
wikipedia-zh AI-ModelScope/wikipedia-cn-20230720-filtered 254547 568.4±713.2, min=37, max=78678 text-generation, general, pretrained pleisto/wikipedia-cn-20230720-filtered
open-orca AI-ModelScope/OpenOrca 994896 382.3±417.4, min=31, max=8740 chat, multilingual, general -
🔥sharegpt-gpt4 AI-ModelScope/sharegpt_gpt4 default
V3_format
zh_38K_format
72684 1047.6±1313.1, min=22, max=66412 chat, multilingual, general, multi-round, gpt4 -
deepctrl-sft AI-ModelScope/deepctrl-sft-data default
en
14149024 389.8±628.6, min=21, max=626237 chat, general, sft, multi-round -
🔥coig-cqia AI-ModelScope/COIG-CQIA chinese_traditional
coig_pc
exam
finance
douban
human_value
logi_qa
ruozhiba
segmentfault
wiki
wikihow
xhs
zhihu
44694 703.8±654.2, min=33, max=19288 general -
🔥ruozhiba AI-ModelScope/ruozhiba post-annual
title-good
title-norm
85658 39.9±13.1, min=21, max=559 pretrain -
long-alpaca-12k AI-ModelScope/LongAlpaca-12k 11998 9619.0±8295.8, min=36, max=78925 longlora, QA Yukang/LongAlpaca-12k
🔥ms-agent iic/ms_agent 26336 650.9±217.2, min=209, max=2740 chat, agent, multi-round -
🔥ms-agent-for-agentfabric AI-ModelScope/ms_agent_for_agentfabric default
addition
30000 617.8±199.1, min=251, max=2657 chat, agent, multi-round -
ms-agent-multirole iic/MSAgent-MultiRole 9500 447.6±84.9, min=145, max=1101 chat, agent, multi-round, role-play, multi-agent -
🔥toolbench-for-alpha-umi shenweizhou/alpha-umi-toolbench-processed-v2 backbone
caller
planner
summarizer
1448337 1439.7±853.9, min=123, max=18467 chat, agent -
damo-agent-zh damo/MSAgent-Bench 386984 956.5±407.3, min=326, max=19001 chat, agent, multi-round -
damo-agent-zh-mini damo/MSAgent-Bench 20845 1326.4±329.6, min=571, max=4304 chat, agent, multi-round -
agent-instruct-all-en huangjintao/AgentInstruct_copy alfworld
db
kg
mind2web
os
webshop
1866 1144.3±635.5, min=206, max=6412 chat, agent, multi-round -
code-alpaca-en wyj123456/code_alpaca_en 20016 100.2±60.1, min=29, max=1776 - sahil2801/CodeAlpaca-20k
🔥leetcode-python-en AI-ModelScope/leetcode-solutions-python 2359 727.1±235.9, min=259, max=2146 chat, coding -
🔥codefuse-python-en codefuse-ai/CodeExercise-Python-27k 27224 483.6±193.9, min=45, max=3082 chat, coding -
🔥codefuse-evol-instruction-zh codefuse-ai/Evol-instruction-66k 66862 439.6±206.3, min=37, max=2983 chat, coding -
medical-en huangjintao/medical_zh en 117617 257.4±89.1, min=36, max=2564 chat, medical -
medical-zh huangjintao/medical_zh zh 1950972 167.2±219.7, min=26, max=27351 chat, medical -
🔥disc-med-sft-zh AI-ModelScope/DISC-Med-SFT 441767 354.1±193.1, min=25, max=2231 chat, medical Flmc/DISC-Med-SFT
lawyer-llama-zh AI-ModelScope/lawyer_llama_data 21476 194.4±91.7, min=27, max=924 chat, law Skepsun/lawyer_llama_data
tigerbot-law-zh AI-ModelScope/tigerbot-law-plugin 55895 109.9±126.4, min=37, max=18878 text-generation, law, pretrained TigerResearch/tigerbot-law-plugin
🔥disc-law-sft-zh AI-ModelScope/DISC-Law-SFT 166758 533.7±495.4, min=30, max=15169 chat, law ShengbinYue/DISC-Law-SFT
🔥blossom-math-zh AI-ModelScope/blossom-math-v2 10000 169.3±58.7, min=35, max=563 chat, math Azure99/blossom-math-v2
school-math-zh AI-ModelScope/school_math_0.25M 248480 157.7±72.2, min=33, max=3450 chat, math, quality BelleGroup/school_math_0.25M
open-platypus-en AI-ModelScope/Open-Platypus 24926 367.9±254.8, min=30, max=3951 chat, math, quality garage-bAInd/Open-Platypus
text2sql-en AI-ModelScope/texttosqlv2_25000_v2 25000 274.6±326.4, min=38, max=1975 chat, sql Clinton/texttosqlv2_25000_v2
🔥sql-create-context-en AI-ModelScope/sql-create-context 78577 80.2±17.8, min=36, max=456 chat, sql b-mc2/sql-create-context
synthetic-text-to-sql AI-ModelScope/synthetic_text_to_sql default 100000 283.4±115.8, min=61, max=1356 nl2sql, en gretelai/synthetic_text_to_sql
🔥advertise-gen-zh lvjianjin/AdvertiseGen 98399 130.6±21.7, min=51, max=241 text-generation shibing624/AdvertiseGen
🔥dureader-robust-zh modelscope/DuReader_robust-QG 17899 241.1±137.4, min=60, max=1416 text-generation -
cmnli-zh modelscope/clue cmnli 404024 82.6±16.6, min=51, max=199 text-generation, classification clue
🔥jd-sentiment-zh DAMO_NLP/jd 50000 66.0±83.2, min=39, max=4039 text-generation, classification -
🔥hc3-zh simpleai/HC3-Chinese baike
open_qa
nlpcc_dbqa
finance
medicine
law
psychology
39781 176.8±81.5, min=57, max=3051 text-generation, classification Hello-SimpleAI/HC3-Chinese
🔥hc3-en simpleai/HC3 finance
medicine
11021 298.3±138.7, min=65, max=2267 text-generation, classification Hello-SimpleAI/HC3
dolly-15k AI-ModelScope/databricks-dolly-15k default 15011 199.2±267.8, min=22, max=8615 multi-task, en, quality databricks/databricks-dolly-15k
finance-en wyj123456/finance_en 68911 135.6±134.3, min=26, max=3525 chat, financial ssbuild/alpaca_finance_en
poetry-zh modelscope/chinese-poetry-collection 390309 55.2±9.4, min=23, max=83 text-generation, poetry -
webnovel-zh AI-ModelScope/webnovel_cn 50000 1478.9±11526.1, min=100, max=490484 chat, novel zxbsmk/webnovel_cn
generated-chat-zh AI-ModelScope/generated_chat_0.4M 396004 273.3±52.0, min=32, max=873 chat, character-dialogue BelleGroup/generated_chat_0.4M
🔥self-cognition swift/self-cognition 134 53.6±18.6, min=29, max=121 chat, self-cognition modelscope/self-cognition
cls-fudan-news-zh damo/zh_cls_fudan-news 4959 3234.4±2547.5, min=91, max=19548 chat, classification -
ner-jave-zh damo/zh_ner-JAVE 1266 118.3±45.5, min=44, max=223 chat, ner -
coco-en modelscope/coco_2014_caption coco_2014_caption 454617 299.8±2.8, min=295, max=352 chat, multi-modal, vision -
🔥coco-en-mini modelscope/coco_2014_caption coco_2014_caption 40504 299.8±2.6, min=295, max=338 chat, multi-modal, vision -
coco-en-2 modelscope/coco_2014_caption coco_2014_caption 454617 36.8±2.8, min=32, max=89 chat, multi-modal, vision -
🔥coco-en-2-mini modelscope/coco_2014_caption coco_2014_caption 40504 36.8±2.6, min=32, max=75 chat, multi-modal, vision -
capcha-images AI-ModelScope/captcha-images 8000 31.0±0.0, min=31, max=31 chat, multi-modal, vision -
aishell1-zh speech_asr/speech_asr_aishell1_trainsets 141600 152.2±36.8, min=63, max=419 chat, multi-modal, audio -
🔥aishell1-zh-mini speech_asr/speech_asr_aishell1_trainsets 14526 152.2±35.6, min=74, max=359 chat, multi-modal, audio -
hh-rlhf AI-ModelScope/hh-rlhf harmless-base
helpful-base
helpful-online
helpful-rejection-sampled
127459 245.4±190.7, min=22, max=1999 rlhf, dpo, pairwise -
🔥hh-rlhf-cn AI-ModelScope/hh_rlhf_cn hh_rlhf
harmless_base_cn
harmless_base_en
helpful_base_cn
helpful_base_en
355920 171.2±122.7, min=22, max=3078 rlhf, dpo, pairwise -
orpo-dpo-mix-40k AI-ModelScope/orpo-dpo-mix-40k default 43666 548.3±397.4, min=28, max=8483 dpo, orpo, en, quality mlabonne/orpo-dpo-mix-40k
stack-exchange-paired AI-ModelScope/stack-exchange-paired 4483004 534.5±594.6, min=31, max=56588 hfrl, dpo, pairwise lvwerra/stack-exchange-paired
shareai-llama3-dpo-zh-en-emoji hjh0119/shareAI-Llama3-DPO-zh-en-emoji default 2449 334.0±162.8, min=36, max=1801 rlhf, dpo, pairwise -
pileval huangjintao/pile-val-backup 214670 1612.3±8856.2, min=11, max=1208955 text-generation, awq mit-han-lab/pile-val-backup
mantis-instruct swift/Mantis-Instruct birds-to-words
chartqa
coinstruct
contrastive_caption
docvqa
dreamsim
dvqa
iconqa
imagecode
llava_665k_multi
lrv_multi
multi_vqa
nextqa
nlvr2
spot-the-diff
star
visual_story_telling
655351 825.7±812.5, min=284, max=13563 chat, multi-modal, vision, quality TIGER-Lab/Mantis-Instruct
llava-data-instruct swift/llava-data llava_instruct 364100 189.0±142.1, min=33, max=5183 sft, multi-modal, quality TIGER-Lab/llava-data
midefics swift/MideficsDataset 3800 201.3±70.2, min=60, max=454 medical, en, vqa WinterSchool/MideficsDataset
gqa None train_all_instructions - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, vqa, quality lmms-lab/GQA
text-caps swift/TextCaps 18145 38.2±4.4, min=31, max=73 multi-modal, en, caption, quality HuggingFaceM4/TextCaps
a-okvqa swift/A-OKVQA 18201 45.8±7.9, min=32, max=100 multi-modal, en, vqa, quality HuggingFaceM4/A-OKVQA
okvqa swift/OK-VQA_train 9009 34.4±3.3, min=28, max=59 multi-modal, en, vqa, quality Multimodal-Fatima/OK-VQA_train
ocr-vqa swift/OCR-VQA 186753 35.6±6.6, min=29, max=193 multi-modal, en, ocr-vqa howard-hou/OCR-VQA
grit swift/GRIT - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, caption-grounding, quality zzliang/GRIT
llava-instruct-mix swift/llava-instruct-mix-vsft 13640 179.8±120.2, min=30, max=962 multi-modal, en, vqa, quality HuggingFaceH4/llava-instruct-mix-vsft
lnqa swift/lnqa - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, ocr-vqa, quality vikhyatk/lnqa
science-qa swift/ScienceQA 8315 100.3±59.5, min=38, max=638 multi-modal, science, vqa, quality derek-thomas/ScienceQA
guanaco AI-ModelScope/GuanacoDataset default 31561 250.1±70.3, min=89, max=1436 chat, zh JosephusCheung/GuanacoDataset
mind2web swift/Multimodal-Mind2Web 1009 297522.4±325496.2, min=8592, max=3499715 agent, multi-modal osunlp/Multimodal-Mind2Web
sharegpt-4o-image AI-ModelScope/ShareGPT-4o image_caption 57289 638.7±157.9, min=47, max=4640 vqa, multi-modal OpenGVLab/ShareGPT-4o
m3it AI-ModelScope/M3IT coco
vqa-v2
shapes
shapes-rephrased
coco-goi-rephrased
snli-ve
snli-ve-rephrased
okvqa
a-okvqa
viquae
textcap
docvqa
science-qa
imagenet
imagenet-open-ended
imagenet-rephrased
coco-goi
clevr
clevr-rephrased
nlvr
coco-itm
coco-itm-rephrased
vsr
vsr-rephrased
mocheg
mocheg-rephrased
coco-text
fm-iqa
activitynet-qa
msrvtt
ss
coco-cn
refcoco
refcoco-rephrased
multi30k
image-paragraph-captioning
visual-dialog
visual-dialog-rephrased
iqa
vcr
visual-mrc
ivqa
msrvtt-qa
msvd-qa
gqa
text-vqa
ocr-vqa
st-vqa
flickr8k-cn
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, vision -
sharegpt4v AI-ModelScope/ShareGPT4V ShareGPT4V
ShareGPT4V-PT
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, vision -
llava-instruct-150k AI-ModelScope/LLaVA-Instruct-150K 624610 490.4±180.2, min=288, max=5438 chat, multi-modal, vision -
llava-pretrain AI-ModelScope/LLaVA-Pretrain blip_laion_cc_sbu_558k - Dataset is too huge, please click the original link to view the dataset stat. vqa, multi-modal, quality liuhaotian/LLaVA-Pretrain
RLAIF-v-dataset swift/RLAIF-V-Dataset 83132 113.7±49.7, min=30, max=540 multi-modal, rlhf, quality openbmb/RLAIF-V-Dataset
alpaca-cleaned AI-ModelScope/alpaca-cleaned 51760 177.9±126.4, min=26, max=1044 chat, general, bench, quality yahma/alpaca-cleaned
aya-collection swift/aya_collection aya_dataset 202364 494.0±6911.3, min=21, max=3044268 multi-lingual, qa CohereForAI/aya_collection
belle-generated-chat-0.4M AI-ModelScope/generated_chat_0.4M 396004 273.3±52.0, min=32, max=873 common, zh BelleGroup/generated_chat_0.4M
belle-math-0.25M AI-ModelScope/school_math_0.25M 248480 157.7±72.2, min=33, max=3450 math, zh BelleGroup/school_math_0.25M
belle-train-0.5M-CN AI-ModelScope/train_0.5M_CN 519255 129.1±91.5, min=27, max=6507 common, zh, quality BelleGroup/train_0.5M_CN
belle-train-1M-CN AI-ModelScope/train_1M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_1M_CN
belle-train-2M-CN AI-ModelScope/train_2M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_2M_CN
belle-train-3.5M-CN swift/train_3.5M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_3.5M_CN
c4 None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality allenai/c4
chart-qa swift/ChartQA 28299 43.1±5.5, min=29, max=77 en, vqa, quality HuggingFaceM4/ChartQA
chinese-c4 None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, zh, quality shjwudp/chinese-c4
cinepile swift/cinepile - Dataset is too huge, please click the original link to view the dataset stat. vqa, en, youtube, video tomg-group-umd/cinepile
codealpaca-20k AI-ModelScope/CodeAlpaca-20k 20016 100.2±60.1, min=29, max=1776 code, en HuggingFaceH4/CodeAlpaca_20K
cosmopedia None auto_math_text
khanacademy
openstax
stanford
stories
web_samples_v1
web_samples_v2
wikihow
- Dataset is too huge, please click the original link to view the dataset stat. multi-domain, en, qa HuggingFaceTB/cosmopedia
cosmopedia-100k swift/cosmopedia-100k 100000 1024.5±243.1, min=239, max=2981 multi-domain, en, qa HuggingFaceTB/cosmopedia-100k
dolma None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality allenai/dolma
dolphin swift/dolphin flan1m-alpaca-uncensored
flan5m-alpaca-uncensored
- Dataset is too huge, please click the original link to view the dataset stat. en cognitivecomputations/dolphin
evol-instruct-v2 AI-ModelScope/WizardLM_evol_instruct_V2_196k 109184 480.9±333.1, min=26, max=4942 chat, en WizardLM/WizardLM_evol_instruct_V2_196k
fineweb None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality HuggingFaceFW/fineweb
github-code swift/github-code - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality codeparrot/github-code
gpt4v-dataset swift/gpt4v-dataset 12356 217.9±68.3, min=35, max=596 en, caption, multi-modal, quality laion/gpt4v-dataset
guanaco-belle-merge AI-ModelScope/guanaco_belle_merge_v1.0 693987 134.2±92.0, min=24, max=6507 QA, zh Chinese-Vicuna/guanaco_belle_merge_v1.0
llava-med-zh-instruct swift/llava-med-zh-instruct-60k 56649 207.7±67.6, min=37, max=657 zh, medical, vqa BUAADreamer/llava-med-zh-instruct-60k
lmsys-chat-1m AI-ModelScope/lmsys-chat-1m - Dataset is too huge, please click the original link to view the dataset stat. chat, en lmsys/lmsys-chat-1m
math-instruct AI-ModelScope/MathInstruct 262283 254.4±183.5, min=11, max=4383 math, cot, en, quality TIGER-Lab/MathInstruct
math-plus TIGER-Lab/MATH-plus train 893929 287.1±158.7, min=24, max=2919 qa, math, en, quality TIGER-Lab/MATH-plus
moondream2-coyo-5M swift/moondream2-coyo-5M-captions - Dataset is too huge, please click the original link to view the dataset stat. caption, pretrain, quality isidentical/moondream2-coyo-5M-captions
no-robots swift/no_robots 9485 298.7±246.4, min=40, max=6739 multi-task, quality, human-annotated HuggingFaceH4/no_robots
open-hermes swift/OpenHermes-2.5 - Dataset is too huge, please click the original link to view the dataset stat. cot, en, quality teknium/OpenHermes-2.5
open-orca-chinese AI-ModelScope/OpenOrca-Chinese - Dataset is too huge, please click the original link to view the dataset stat. QA, zh, general, quality yys/OpenOrca-Chinese
orca_dpo_pairs swift/orca_dpo_pairs 12859 366.9±251.9, min=30, max=2010 rlhf, quality Intel/orca_dpo_pairs
path-vqa swift/path-vqa 19654 34.8±7.3, min=27, max=85 multi-modal, vqa, medical flaviagiammarino/path-vqa
pile AI-ModelScope/pile - Dataset is too huge, please click the original link to view the dataset stat. pretrain EleutherAI/pile
poison-mpts iic/100PoisonMpts 906 150.6±80.8, min=39, max=656 poison-management, zh -
redpajama-data-1t swift/RedPajama-Data-1T - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality togethercomputer/RedPajama-Data-1T
redpajama-data-v2 swift/RedPajama-Data-V2 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality togethercomputer/RedPajama-Data-V2
refinedweb None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality tiiuae/falcon-refinedweb
rwkv-pretrain-web mapjack/openwebtext_dataset - Dataset is too huge, please click the original link to view the dataset stat. pretrain, zh, quality -
sft-nectar AI-ModelScope/SFT-Nectar 131192 396.4±272.1, min=44, max=10732 cot, en, quality AstraMindAI/SFT-Nectar
skypile AI-ModelScope/SkyPile-150B - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality, zh Skywork/SkyPile-150B
slim-orca swift/SlimOrca 517982 399.1±370.2, min=35, max=8756 quality, en Open-Orca/SlimOrca
slim-pajama-627b None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality cerebras/SlimPajama-627B
starcoder AI-ModelScope/starcoderdata - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality bigcode/starcoderdata
tagengo-gpt4 swift/tagengo-gpt4 78057 472.3±292.9, min=22, max=3521 chat, multi-lingual, quality lightblue/tagengo-gpt4
the-stack AI-ModelScope/the-stack - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality bigcode/the-stack
ultrachat-200k swift/ultrachat_200k 207865 1195.4±573.7, min=76, max=4470 chat, en, quality HuggingFaceH4/ultrachat_200k
vqa-v2 swift/VQAv2 443757 31.8±2.2, min=27, max=58 en, vqa, quality HuggingFaceM4/VQAv2
web-instruct-sub swift/WebInstructSub - Dataset is too huge, please click the original link to view the dataset stat. qa, en, math, quality, multi-domain, science TIGER-Lab/WebInstructSub
wikipedia swift/wikipedia - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality wikipedia
wikipedia-cn-filtered AI-ModelScope/wikipedia-cn-20230720-filtered - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality pleisto/wikipedia-cn-20230720-filtered
zhihu-rlhf AI-ModelScope/zhihu_rlhf_3k 3460 594.5±365.9, min=31, max=1716 rlhf, dpo, zh liyucheng/zhihu_rlhf_3k