Supported models and datasets

Models

The table below introcudes all models supported by SWIFT:

Model List: The model_type information registered in SWIFT.
Default Lora Target Modules: Default lora_target_modules used by the model.
Default Template: Default template used by the model.
Support Flash Attn: Whether the model supports flash attention to accelerate sft and infer.
Support VLLM: Whether the model supports vllm to accelerate infer and deployment.
Requires: The extra requirements used by the model.

LLM

Model Type	Model ID	Default Lora Target Modules	Default Template	Support Flash Attn	Support VLLM	Requires	Tags	HF Model ID
qwen-1_8b	qwen/Qwen-1_8B	c_attn	default-generation	✔	✔		-	Qwen/Qwen-1_8B
qwen-1_8b-chat	qwen/Qwen-1_8B-Chat	c_attn	qwen	✔	✔		-	Qwen/Qwen-1_8B-Chat
qwen-1_8b-chat-int4	qwen/Qwen-1_8B-Chat-Int4	c_attn	qwen	✔	✔	auto_gptq>=0.5	-	Qwen/Qwen-1_8B-Chat-Int4
qwen-1_8b-chat-int8	qwen/Qwen-1_8B-Chat-Int8	c_attn	qwen	✔	✔	auto_gptq>=0.5	-	Qwen/Qwen-1_8B-Chat-Int8
qwen-7b	qwen/Qwen-7B	c_attn	default-generation	✔	✔		-	Qwen/Qwen-7B
qwen-7b-chat	qwen/Qwen-7B-Chat	c_attn	qwen	✔	✔		-	Qwen/Qwen-7B-Chat
qwen-7b-chat-int4	qwen/Qwen-7B-Chat-Int4	c_attn	qwen	✔	✔	auto_gptq>=0.5	-	Qwen/Qwen-7B-Chat-Int4
qwen-7b-chat-int8	qwen/Qwen-7B-Chat-Int8	c_attn	qwen	✔	✔	auto_gptq>=0.5	-	Qwen/Qwen-7B-Chat-Int8
qwen-14b	qwen/Qwen-14B	c_attn	default-generation	✔	✔		-	Qwen/Qwen-14B
qwen-14b-chat	qwen/Qwen-14B-Chat	c_attn	qwen	✔	✔		-	Qwen/Qwen-14B-Chat
qwen-14b-chat-int4	qwen/Qwen-14B-Chat-Int4	c_attn	qwen	✔	✔	auto_gptq>=0.5	-	Qwen/Qwen-14B-Chat-Int4
qwen-14b-chat-int8	qwen/Qwen-14B-Chat-Int8	c_attn	qwen	✔	✔	auto_gptq>=0.5	-	Qwen/Qwen-14B-Chat-Int8
qwen-72b	qwen/Qwen-72B	c_attn	default-generation	✔	✔		-	Qwen/Qwen-72B
qwen-72b-chat	qwen/Qwen-72B-Chat	c_attn	qwen	✔	✔		-	Qwen/Qwen-72B-Chat
qwen-72b-chat-int4	qwen/Qwen-72B-Chat-Int4	c_attn	qwen	✔	✔	auto_gptq>=0.5	-	Qwen/Qwen-72B-Chat-Int4
qwen-72b-chat-int8	qwen/Qwen-72B-Chat-Int8	c_attn	qwen	✔	✔	auto_gptq>=0.5	-	Qwen/Qwen-72B-Chat-Int8
modelscope-agent-7b	iic/ModelScope-Agent-7B	c_attn	modelscope-agent	✔	✘		-	-
modelscope-agent-14b	iic/ModelScope-Agent-14B	c_attn	modelscope-agent	✔	✘		-	-
qwen1half-0_5b	qwen/Qwen1.5-0.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-0.5B
qwen1half-1_8b	qwen/Qwen1.5-1.8B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-1.8B
qwen1half-4b	qwen/Qwen1.5-4B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-4B
qwen1half-7b	qwen/Qwen1.5-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-7B
qwen1half-14b	qwen/Qwen1.5-14B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-14B
qwen1half-32b	qwen/Qwen1.5-32B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-32B
qwen1half-72b	qwen/Qwen1.5-72B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-72B
qwen1half-110b	qwen/Qwen1.5-110B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-110B
codeqwen1half-7b	qwen/CodeQwen1.5-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/CodeQwen1.5-7B
qwen1half-moe-a2_7b	qwen/Qwen1.5-MoE-A2.7B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.40	-	Qwen/Qwen1.5-MoE-A2.7B
qwen1half-0_5b-chat	qwen/Qwen1.5-0.5B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-0.5B-Chat
qwen1half-1_8b-chat	qwen/Qwen1.5-1.8B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-1.8B-Chat
qwen1half-4b-chat	qwen/Qwen1.5-4B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-4B-Chat
qwen1half-7b-chat	qwen/Qwen1.5-7B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-7B-Chat
qwen1half-14b-chat	qwen/Qwen1.5-14B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-14B-Chat
qwen1half-32b-chat	qwen/Qwen1.5-32B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-32B-Chat
qwen1half-72b-chat	qwen/Qwen1.5-72B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-72B-Chat
qwen1half-110b-chat	qwen/Qwen1.5-110B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-110B-Chat
qwen1half-moe-a2_7b-chat	qwen/Qwen1.5-MoE-A2.7B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.40	-	Qwen/Qwen1.5-MoE-A2.7B-Chat
codeqwen1half-7b-chat	qwen/CodeQwen1.5-7B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/CodeQwen1.5-7B-Chat
qwen1half-0_5b-chat-int4	qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4
qwen1half-1_8b-chat-int4	qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4
qwen1half-4b-chat-int4	qwen/Qwen1.5-4B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-4B-Chat-GPTQ-Int4
qwen1half-7b-chat-int4	qwen/Qwen1.5-7B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-7B-Chat-GPTQ-Int4
qwen1half-14b-chat-int4	qwen/Qwen1.5-14B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-14B-Chat-GPTQ-Int4
qwen1half-32b-chat-int4	qwen/Qwen1.5-32B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-32B-Chat-GPTQ-Int4
qwen1half-72b-chat-int4	qwen/Qwen1.5-72B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-72B-Chat-GPTQ-Int4
qwen1half-110b-chat-int4	qwen/Qwen1.5-110B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-110B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-int8	qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8
qwen1half-1_8b-chat-int8	qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8
qwen1half-4b-chat-int8	qwen/Qwen1.5-4B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-4B-Chat-GPTQ-Int8
qwen1half-7b-chat-int8	qwen/Qwen1.5-7B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-7B-Chat-GPTQ-Int8
qwen1half-14b-chat-int8	qwen/Qwen1.5-14B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-14B-Chat-GPTQ-Int8
qwen1half-72b-chat-int8	qwen/Qwen1.5-72B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-72B-Chat-GPTQ-Int8
qwen1half-moe-a2_7b-chat-int4	qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✘	auto_gptq>=0.5, transformers>=4.40	-	Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-awq	qwen/Qwen1.5-0.5B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-0.5B-Chat-AWQ
qwen1half-1_8b-chat-awq	qwen/Qwen1.5-1.8B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-1.8B-Chat-AWQ
qwen1half-4b-chat-awq	qwen/Qwen1.5-4B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-4B-Chat-AWQ
qwen1half-7b-chat-awq	qwen/Qwen1.5-7B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-7B-Chat-AWQ
qwen1half-14b-chat-awq	qwen/Qwen1.5-14B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-14B-Chat-AWQ
qwen1half-32b-chat-awq	qwen/Qwen1.5-32B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-32B-Chat-AWQ
qwen1half-72b-chat-awq	qwen/Qwen1.5-72B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-72B-Chat-AWQ
qwen1half-110b-chat-awq	qwen/Qwen1.5-110B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-110B-Chat-AWQ
codeqwen1half-7b-chat-awq	qwen/CodeQwen1.5-7B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/CodeQwen1.5-7B-Chat-AWQ
qwen2-0_5b	qwen/Qwen2-0.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen2-0.5B
qwen2-0_5b-instruct	qwen/Qwen2-0.5B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen2-0.5B-Instruct
qwen2-0_5b-instruct-int4	qwen/Qwen2-0.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4
qwen2-0_5b-instruct-int8	qwen/Qwen2-0.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8
qwen2-0_5b-instruct-awq	qwen/Qwen2-0.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen2-0.5B-Instruct-AWQ
qwen2-1_5b	qwen/Qwen2-1.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen2-1.5B
qwen2-1_5b-instruct	qwen/Qwen2-1.5B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen2-1.5B-Instruct
qwen2-1_5b-instruct-int4	qwen/Qwen2-1.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4
qwen2-1_5b-instruct-int8	qwen/Qwen2-1.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8
qwen2-1_5b-instruct-awq	qwen/Qwen2-1.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen2-1.5B-Instruct-AWQ
qwen2-7b	qwen/Qwen2-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen2-7B
qwen2-7b-instruct	qwen/Qwen2-7B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen2-7B-Instruct
qwen2-7b-instruct-int4	qwen/Qwen2-7B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-7B-Instruct-GPTQ-Int4
qwen2-7b-instruct-int8	qwen/Qwen2-7B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-7B-Instruct-GPTQ-Int8
qwen2-7b-instruct-awq	qwen/Qwen2-7B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen2-7B-Instruct-AWQ
qwen2-72b	qwen/Qwen2-72B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.37	-	Qwen/Qwen2-72B
qwen2-72b-instruct	qwen/Qwen2-72B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37	-	Qwen/Qwen2-72B-Instruct
qwen2-72b-instruct-int4	qwen/Qwen2-72B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-72B-Instruct-GPTQ-Int4
qwen2-72b-instruct-int8	qwen/Qwen2-72B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-72B-Instruct-GPTQ-Int8
qwen2-72b-instruct-awq	qwen/Qwen2-72B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.37, autoawq	-	Qwen/Qwen2-72B-Instruct-AWQ
qwen2-57b-a14b	qwen/Qwen2-57B-A14B	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.40	-	Qwen/Qwen2-57B-A14B
qwen2-57b-a14b-instruct	qwen/Qwen2-57B-A14B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	transformers>=4.40	-	Qwen/Qwen2-57B-A14B-Instruct
qwen2-57b-a14b-instruct-int4	qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	auto_gptq>=0.5, transformers>=4.40	-	Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4
chatglm2-6b	ZhipuAI/chatglm2-6b	query_key_value	chatglm2	✘	✔		-	THUDM/chatglm2-6b
chatglm2-6b-32k	ZhipuAI/chatglm2-6b-32k	query_key_value	chatglm2	✘	✔		-	THUDM/chatglm2-6b-32k
chatglm3-6b-base	ZhipuAI/chatglm3-6b-base	query_key_value	chatglm-generation	✘	✔		-	THUDM/chatglm3-6b-base
chatglm3-6b	ZhipuAI/chatglm3-6b	query_key_value	chatglm3	✘	✔		-	THUDM/chatglm3-6b
chatglm3-6b-32k	ZhipuAI/chatglm3-6b-32k	query_key_value	chatglm3	✘	✔		-	THUDM/chatglm3-6b-32k
chatglm3-6b-128k	ZhipuAI/chatglm3-6b-128k	query_key_value	chatglm3	✘	✔		-	THUDM/chatglm3-6b-128k
codegeex2-6b	ZhipuAI/codegeex2-6b	query_key_value	chatglm-generation	✘	✔	transformers<4.34	coding	THUDM/codegeex2-6b
glm4-9b	ZhipuAI/glm-4-9b	query_key_value	chatglm-generation	✘	✔		-	THUDM/glm-4-9b
glm4-9b-chat	ZhipuAI/glm-4-9b-chat	query_key_value	chatglm3	✘	✔		-	THUDM/glm-4-9b-chat
glm4-9b-chat-1m	ZhipuAI/glm-4-9b-chat-1m	query_key_value	chatglm3	✘	✔		-	THUDM/glm-4-9b-chat-1m
llama2-7b	modelscope/Llama-2-7b-ms	q_proj, k_proj, v_proj	default-generation	✔	✔		-	meta-llama/Llama-2-7b-hf
llama2-7b-chat	modelscope/Llama-2-7b-chat-ms	q_proj, k_proj, v_proj	llama	✔	✔		-	meta-llama/Llama-2-7b-chat-hf
llama2-13b	modelscope/Llama-2-13b-ms	q_proj, k_proj, v_proj	default-generation	✔	✔		-	meta-llama/Llama-2-13b-hf
llama2-13b-chat	modelscope/Llama-2-13b-chat-ms	q_proj, k_proj, v_proj	llama	✔	✔		-	meta-llama/Llama-2-13b-chat-hf
llama2-70b	modelscope/Llama-2-70b-ms	q_proj, k_proj, v_proj	default-generation	✔	✔		-	meta-llama/Llama-2-70b-hf
llama2-70b-chat	modelscope/Llama-2-70b-chat-ms	q_proj, k_proj, v_proj	llama	✔	✔		-	meta-llama/Llama-2-70b-chat-hf
llama2-7b-aqlm-2bit-1x16	AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf	q_proj, k_proj, v_proj	default-generation	✔	✘	transformers>=4.38, aqlm, torch>=2.2.0	-	ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf
llama3-8b	LLM-Research/Meta-Llama-3-8B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	meta-llama/Meta-Llama-3-8B
llama3-8b-instruct	LLM-Research/Meta-Llama-3-8B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔		-	meta-llama/Meta-Llama-3-8B-Instruct
llama3-8b-instruct-int4	huangjintao/Meta-Llama-3-8B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	llama3	✔	✔	auto_gptq	-	study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4
llama3-8b-instruct-int8	huangjintao/Meta-Llama-3-8B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	llama3	✔	✔	auto_gptq	-	study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8
llama3-8b-instruct-awq	huangjintao/Meta-Llama-3-8B-Instruct-AWQ	q_proj, k_proj, v_proj	llama3	✔	✔	autoawq	-	study-hjt/Meta-Llama-3-8B-Instruct-AWQ
llama3-70b	LLM-Research/Meta-Llama-3-70B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	meta-llama/Meta-Llama-3-70B
llama3-70b-instruct	LLM-Research/Meta-Llama-3-70B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔		-	meta-llama/Meta-Llama-3-70B-Instruct
llama3-70b-instruct-int4	huangjintao/Meta-Llama-3-70B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	llama3	✔	✔	auto_gptq	-	study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4
llama3-70b-instruct-int8	huangjintao/Meta-Llama-3-70b-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	llama3	✔	✔	auto_gptq	-	study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8
llama3-70b-instruct-awq	huangjintao/Meta-Llama-3-70B-Instruct-AWQ	q_proj, k_proj, v_proj	llama3	✔	✔	autoawq	-	study-hjt/Meta-Llama-3-70B-Instruct-AWQ
chinese-llama-2-1_3b	AI-ModelScope/chinese-llama-2-1.3b	q_proj, k_proj, v_proj	default-generation	✔	✔		-	hfl/chinese-llama-2-1.3b
chinese-llama-2-7b	AI-ModelScope/chinese-llama-2-7b	q_proj, k_proj, v_proj	default-generation	✔	✔		-	hfl/chinese-llama-2-7b
chinese-llama-2-7b-16k	AI-ModelScope/chinese-llama-2-7b-16k	q_proj, k_proj, v_proj	default-generation	✔	✔		-	hfl/chinese-llama-2-7b-16k
chinese-llama-2-7b-64k	AI-ModelScope/chinese-llama-2-7b-64k	q_proj, k_proj, v_proj	default-generation	✔	✔		-	hfl/chinese-llama-2-7b-64k
chinese-llama-2-13b	AI-ModelScope/chinese-llama-2-13b	q_proj, k_proj, v_proj	default-generation	✔	✔		-	hfl/chinese-llama-2-13b
chinese-llama-2-13b-16k	AI-ModelScope/chinese-llama-2-13b-16k	q_proj, k_proj, v_proj	default-generation	✔	✔		-	hfl/chinese-llama-2-13b-16k
chinese-alpaca-2-1_3b	AI-ModelScope/chinese-alpaca-2-1.3b	q_proj, k_proj, v_proj	llama	✔	✔		-	hfl/chinese-alpaca-2-1.3b
chinese-alpaca-2-7b	AI-ModelScope/chinese-alpaca-2-7b	q_proj, k_proj, v_proj	llama	✔	✔		-	hfl/chinese-alpaca-2-7b
chinese-alpaca-2-7b-16k	AI-ModelScope/chinese-alpaca-2-7b-16k	q_proj, k_proj, v_proj	llama	✔	✔		-	hfl/chinese-alpaca-2-7b-16k
chinese-alpaca-2-7b-64k	AI-ModelScope/chinese-alpaca-2-7b-64k	q_proj, k_proj, v_proj	llama	✔	✔		-	hfl/chinese-alpaca-2-7b-64k
chinese-alpaca-2-13b	AI-ModelScope/chinese-alpaca-2-13b	q_proj, k_proj, v_proj	llama	✔	✔		-	hfl/chinese-alpaca-2-13b
chinese-alpaca-2-13b-16k	AI-ModelScope/chinese-alpaca-2-13b-16k	q_proj, k_proj, v_proj	llama	✔	✔		-	hfl/chinese-alpaca-2-13b-16k
llama-3-chinese-8b	ChineseAlpacaGroup/llama-3-chinese-8b	q_proj, k_proj, v_proj	default-generation	✔	✔		-	hfl/llama-3-chinese-8b
llama-3-chinese-8b-instruct	ChineseAlpacaGroup/llama-3-chinese-8b-instruct	q_proj, k_proj, v_proj	llama3	✔	✔		-	hfl/llama-3-chinese-8b-instruct
atom-7b	FlagAlpha/Atom-7B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	FlagAlpha/Atom-7B
atom-7b-chat	FlagAlpha/Atom-7B-Chat	q_proj, k_proj, v_proj	atom	✔	✔		-	FlagAlpha/Atom-7B-Chat
yi-6b	01ai/Yi-6B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-6B
yi-6b-200k	01ai/Yi-6B-200K	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-6B-200K
yi-6b-chat	01ai/Yi-6B-Chat	q_proj, k_proj, v_proj	yi	✔	✔		-	01-ai/Yi-6B-Chat
yi-6b-chat-awq	01ai/Yi-6B-Chat-4bits	q_proj, k_proj, v_proj	yi	✔	✔	autoawq	-	01-ai/Yi-6B-Chat-4bits
yi-6b-chat-int8	01ai/Yi-6B-Chat-8bits	q_proj, k_proj, v_proj	yi	✔	✔	auto_gptq	-	01-ai/Yi-6B-Chat-8bits
yi-9b	01ai/Yi-9B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-9B
yi-9b-200k	01ai/Yi-9B-200K	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-9B-200K
yi-34b	01ai/Yi-34B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-34B
yi-34b-200k	01ai/Yi-34B-200K	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-34B-200K
yi-34b-chat	01ai/Yi-34B-Chat	q_proj, k_proj, v_proj	yi	✔	✔		-	01-ai/Yi-34B-Chat
yi-34b-chat-awq	01ai/Yi-34B-Chat-4bits	q_proj, k_proj, v_proj	yi	✔	✔	autoawq	-	01-ai/Yi-34B-Chat-4bits
yi-34b-chat-int8	01ai/Yi-34B-Chat-8bits	q_proj, k_proj, v_proj	yi	✔	✔	auto_gptq	-	01-ai/Yi-34B-Chat-8bits
yi-1_5-6b	01ai/Yi-1.5-6B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-1.5-6B
yi-1_5-6b-chat	01ai/Yi-1.5-6B-Chat	q_proj, k_proj, v_proj	yi1_5	✔	✔		-	01-ai/Yi-1.5-6B-Chat
yi-1_5-9b	01ai/Yi-1.5-9B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-1.5-9B
yi-1_5-9b-chat	01ai/Yi-1.5-9B-Chat	q_proj, k_proj, v_proj	yi1_5	✔	✔		-	01-ai/Yi-1.5-9B-Chat
yi-1_5-9b-chat-16k	01ai/Yi-1.5-9B-Chat	q_proj, k_proj, v_proj	yi1_5	✔	✔		-	01-ai/Yi-1.5-9B-Chat-16K
yi-1_5-34b	01ai/Yi-1.5-34B	q_proj, k_proj, v_proj	default-generation	✔	✔		-	01-ai/Yi-1.5-34B
yi-1_5-34b-chat	01ai/Yi-1.5-34B-Chat	q_proj, k_proj, v_proj	yi1_5	✔	✔		-	01-ai/Yi-1.5-34B-Chat
yi-1_5-34b-chat-16k	01ai/Yi-1.5-34B-Chat-16K	q_proj, k_proj, v_proj	yi1_5	✔	✔		-	01-ai/Yi-1.5-34B-Chat-16K
yi-1_5-6b-chat-awq-int4	AI-ModelScope/Yi-1.5-6B-Chat-AWQ	q_proj, k_proj, v_proj	yi1_5	✔	✔	autoawq	-	modelscope/Yi-1.5-6B-Chat-AWQ
yi-1_5-6b-chat-gptq-int4	AI-ModelScope/Yi-1.5-6B-Chat-GPTQ	q_proj, k_proj, v_proj	yi1_5	✔	✔	auto_gptq>=0.5	-	modelscope/Yi-1.5-6B-Chat-GPTQ
yi-1_5-9b-chat-awq-int4	AI-ModelScope/Yi-1.5-9B-Chat-AWQ	q_proj, k_proj, v_proj	yi1_5	✔	✔	autoawq	-	modelscope/Yi-1.5-9B-Chat-AWQ
yi-1_5-9b-chat-gptq-int4	AI-ModelScope/Yi-1.5-9B-Chat-GPTQ	q_proj, k_proj, v_proj	yi1_5	✔	✔	auto_gptq>=0.5	-	modelscope/Yi-1.5-9B-Chat-GPTQ
yi-1_5-34b-chat-awq-int4	AI-ModelScope/Yi-1.5-34B-Chat-AWQ	q_proj, k_proj, v_proj	yi1_5	✔	✔	autoawq	-	modelscope/Yi-1.5-34B-Chat-AWQ
yi-1_5-34b-chat-gptq-int4	AI-ModelScope/Yi-1.5-34B-Chat-GPTQ	q_proj, k_proj, v_proj	yi1_5	✔	✔	auto_gptq>=0.5	-	modelscope/Yi-1.5-34B-Chat-GPTQ
internlm-7b	Shanghai_AI_Laboratory/internlm-7b	q_proj, k_proj, v_proj	default-generation	✘	✔		-	internlm/internlm-7b
internlm-7b-chat	Shanghai_AI_Laboratory/internlm-chat-7b	q_proj, k_proj, v_proj	internlm	✘	✔		-	internlm/internlm-chat-7b
internlm-7b-chat-8k	Shanghai_AI_Laboratory/internlm-chat-7b-8k	q_proj, k_proj, v_proj	internlm	✘	✔		-	-
internlm-20b	Shanghai_AI_Laboratory/internlm-20b	q_proj, k_proj, v_proj	default-generation	✘	✔		-	internlm/internlm2-20b
internlm-20b-chat	Shanghai_AI_Laboratory/internlm-chat-20b	q_proj, k_proj, v_proj	internlm	✘	✔		-	internlm/internlm2-chat-20b
internlm2-1_8b	Shanghai_AI_Laboratory/internlm2-1_8b	wqkv	default-generation	✔	✔	transformers>=4.35	-	internlm/internlm2-1_8b
internlm2-1_8b-sft-chat	Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft	wqkv	internlm2	✔	✔	transformers>=4.35	-	internlm/internlm2-chat-1_8b-sft
internlm2-1_8b-chat	Shanghai_AI_Laboratory/internlm2-chat-1_8b	wqkv	internlm2	✔	✔	transformers>=4.35	-	internlm/internlm2-chat-1_8b
internlm2-7b-base	Shanghai_AI_Laboratory/internlm2-base-7b	wqkv	default-generation	✔	✔	transformers>=4.35	-	internlm/internlm2-base-7b
internlm2-7b	Shanghai_AI_Laboratory/internlm2-7b	wqkv	default-generation	✔	✔	transformers>=4.35	-	internlm/internlm2-7b
internlm2-7b-sft-chat	Shanghai_AI_Laboratory/internlm2-chat-7b-sft	wqkv	internlm2	✔	✔	transformers>=4.35	-	internlm/internlm2-chat-7b-sft
internlm2-7b-chat	Shanghai_AI_Laboratory/internlm2-chat-7b	wqkv	internlm2	✔	✔	transformers>=4.35	-	internlm/internlm2-chat-7b
internlm2-20b-base	Shanghai_AI_Laboratory/internlm2-base-20b	wqkv	default-generation	✔	✔	transformers>=4.35	-	internlm/internlm2-base-20b
internlm2-20b	Shanghai_AI_Laboratory/internlm2-20b	wqkv	default-generation	✔	✔	transformers>=4.35	-	internlm/internlm2-20b
internlm2-20b-sft-chat	Shanghai_AI_Laboratory/internlm2-chat-20b-sft	wqkv	internlm2	✔	✔	transformers>=4.35	-	internlm/internlm2-chat-20b-sft
internlm2-20b-chat	Shanghai_AI_Laboratory/internlm2-chat-20b	wqkv	internlm2	✔	✔	transformers>=4.35	-	internlm/internlm2-chat-20b
internlm2-math-7b	Shanghai_AI_Laboratory/internlm2-math-base-7b	wqkv	default-generation	✔	✔	transformers>=4.35	math	internlm/internlm2-math-base-7b
internlm2-math-7b-chat	Shanghai_AI_Laboratory/internlm2-math-7b	wqkv	internlm2	✔	✔	transformers>=4.35	math	internlm/internlm2-math-7b
internlm2-math-20b	Shanghai_AI_Laboratory/internlm2-math-base-20b	wqkv	default-generation	✔	✔	transformers>=4.35	math	internlm/internlm2-math-base-20b
internlm2-math-20b-chat	Shanghai_AI_Laboratory/internlm2-math-20b	wqkv	internlm2	✔	✔	transformers>=4.35	math	internlm/internlm2-math-20b
deepseek-7b	deepseek-ai/deepseek-llm-7b-base	q_proj, k_proj, v_proj	default-generation	✔	✔		-	deepseek-ai/deepseek-llm-7b-base
deepseek-7b-chat	deepseek-ai/deepseek-llm-7b-chat	q_proj, k_proj, v_proj	deepseek	✔	✔		-	deepseek-ai/deepseek-llm-7b-chat
deepseek-moe-16b	deepseek-ai/deepseek-moe-16b-base	q_proj, k_proj, v_proj	default-generation	✔	✔		-	deepseek-ai/deepseek-moe-16b-base
deepseek-moe-16b-chat	deepseek-ai/deepseek-moe-16b-chat	q_proj, k_proj, v_proj	deepseek	✔	✔		-	deepseek-ai/deepseek-moe-16b-chat
deepseek-67b	deepseek-ai/deepseek-llm-67b-base	q_proj, k_proj, v_proj	default-generation	✔	✔		-	deepseek-ai/deepseek-llm-67b-base
deepseek-67b-chat	deepseek-ai/deepseek-llm-67b-chat	q_proj, k_proj, v_proj	deepseek	✔	✔		-	deepseek-ai/deepseek-llm-67b-chat
deepseek-coder-1_3b	deepseek-ai/deepseek-coder-1.3b-base	q_proj, k_proj, v_proj	default-generation	✔	✔		coding	deepseek-ai/deepseek-coder-1.3b-base
deepseek-coder-1_3b-instruct	deepseek-ai/deepseek-coder-1.3b-instruct	q_proj, k_proj, v_proj	deepseek-coder	✔	✔		coding	deepseek-ai/deepseek-coder-1.3b-instruct
deepseek-coder-6_7b	deepseek-ai/deepseek-coder-6.7b-base	q_proj, k_proj, v_proj	default-generation	✔	✔		coding	deepseek-ai/deepseek-coder-6.7b-base
deepseek-coder-6_7b-instruct	deepseek-ai/deepseek-coder-6.7b-instruct	q_proj, k_proj, v_proj	deepseek-coder	✔	✔		coding	deepseek-ai/deepseek-coder-6.7b-instruct
deepseek-coder-33b	deepseek-ai/deepseek-coder-33b-base	q_proj, k_proj, v_proj	default-generation	✔	✔		coding	deepseek-ai/deepseek-coder-33b-base
deepseek-coder-33b-instruct	deepseek-ai/deepseek-coder-33b-instruct	q_proj, k_proj, v_proj	deepseek-coder	✔	✔		coding	deepseek-ai/deepseek-coder-33b-instruct
deepseek-coder-v2-instruct	deepseek-ai/DeepSeek-Coder-V2-Instruct	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2	✔	✔	transformers>=4.39.3	coding	deepseek-ai/DeepSeek-Coder-V2-Instruct
deepseek-coder-v2-lite-instruct	deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2	✔	✔	transformers>=4.39.3	coding	deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
deepseek-math-7b	deepseek-ai/deepseek-math-7b-base	q_proj, k_proj, v_proj	default-generation	✔	✔		math	deepseek-ai/deepseek-math-7b-base
deepseek-math-7b-instruct	deepseek-ai/deepseek-math-7b-instruct	q_proj, k_proj, v_proj	deepseek	✔	✔		math	deepseek-ai/deepseek-math-7b-instruct
deepseek-math-7b-chat	deepseek-ai/deepseek-math-7b-rl	q_proj, k_proj, v_proj	deepseek	✔	✔		math	deepseek-ai/deepseek-math-7b-rl
deepseek-v2-chat	deepseek-ai/DeepSeek-V2-Chat	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2	✔	✔	transformers>=4.39.3	-	deepseek-ai/DeepSeek-V2-Chat
deepseek-v2-lite	deepseek-ai/DeepSeek-V2-Lite	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	default-generation	✔	✔	transformers>=4.39.3	-	deepseek-ai/DeepSeek-V2-Lite
deepseek-v2-lite-chat	deepseek-ai/DeepSeek-V2-Lite-Chat	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2	✔	✔	transformers>=4.39.3	-	deepseek-ai/DeepSeek-V2-Lite-Chat
gemma-2b	AI-ModelScope/gemma-2b	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.38	-	google/gemma-2b
gemma-7b	AI-ModelScope/gemma-7b	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.38	-	google/gemma-7b
gemma-2b-instruct	AI-ModelScope/gemma-2b-it	q_proj, k_proj, v_proj	gemma	✔	✔	transformers>=4.38	-	google/gemma-2b-it
gemma-7b-instruct	AI-ModelScope/gemma-7b-it	q_proj, k_proj, v_proj	gemma	✔	✔	transformers>=4.38	-	google/gemma-7b-it
minicpm-1b-sft-chat	OpenBMB/MiniCPM-1B-sft-bf16	q_proj, k_proj, v_proj	minicpm	✔	✔	transformers>=4.36.0	-	openbmb/MiniCPM-1B-sft-bf16
minicpm-2b-sft-chat	OpenBMB/MiniCPM-2B-sft-fp32	q_proj, k_proj, v_proj	minicpm	✔	✔		-	openbmb/MiniCPM-2B-sft-fp32
minicpm-2b-chat	OpenBMB/MiniCPM-2B-dpo-fp32	q_proj, k_proj, v_proj	minicpm	✔	✔		-	openbmb/MiniCPM-2B-dpo-fp32
minicpm-2b-128k	OpenBMB/MiniCPM-2B-128k	q_proj, k_proj, v_proj	chatml	✔	✔	transformers>=4.36.0	-	openbmb/MiniCPM-2B-128k
minicpm-moe-8x2b	OpenBMB/MiniCPM-MoE-8x2B	q_proj, k_proj, v_proj	minicpm	✔	✔	transformers>=4.36.0	-	openbmb/MiniCPM-MoE-8x2B
openbuddy-llama-65b-chat	OpenBuddy/openbuddy-llama-65b-v8-bf16	q_proj, k_proj, v_proj	openbuddy	✔	✔		-	OpenBuddy/openbuddy-llama-65b-v8-bf16
openbuddy-llama2-13b-chat	OpenBuddy/openbuddy-llama2-13b-v8.1-fp16	q_proj, k_proj, v_proj	openbuddy	✔	✔		-	OpenBuddy/openbuddy-llama2-13b-v8.1-fp16
openbuddy-llama2-70b-chat	OpenBuddy/openbuddy-llama2-70b-v10.1-bf16	q_proj, k_proj, v_proj	openbuddy	✔	✔		-	OpenBuddy/openbuddy-llama2-70b-v10.1-bf16
openbuddy-llama3-8b-chat	OpenBuddy/openbuddy-llama3-8b-v21.1-8k	q_proj, k_proj, v_proj	openbuddy2	✔	✔		-	OpenBuddy/openbuddy-llama3-8b-v21.1-8k
openbuddy-llama3-70b-chat	OpenBuddy/openbuddy-llama3-70b-v21.1-8k	q_proj, k_proj, v_proj	openbuddy2	✔	✔		-	OpenBuddy/openbuddy-llama3-70b-v21.1-8k
openbuddy-mistral-7b-chat	OpenBuddy/openbuddy-mistral-7b-v17.1-32k	q_proj, k_proj, v_proj	openbuddy	✔	✔	transformers>=4.34	-	OpenBuddy/openbuddy-mistral-7b-v17.1-32k
openbuddy-zephyr-7b-chat	OpenBuddy/openbuddy-zephyr-7b-v14.1	q_proj, k_proj, v_proj	openbuddy	✔	✔	transformers>=4.34	-	OpenBuddy/openbuddy-zephyr-7b-v14.1
openbuddy-deepseek-67b-chat	OpenBuddy/openbuddy-deepseek-67b-v15.2	q_proj, k_proj, v_proj	openbuddy	✔	✔		-	OpenBuddy/openbuddy-deepseek-67b-v15.2
openbuddy-mixtral-moe-7b-chat	OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k	q_proj, k_proj, v_proj	openbuddy	✔	✔	transformers>=4.36	-	OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k
mistral-7b	AI-ModelScope/Mistral-7B-v0.1	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.34	-	mistralai/Mistral-7B-v0.1
mistral-7b-v2	AI-ModelScope/Mistral-7B-v0.2-hf	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.34	-	alpindale/Mistral-7B-v0.2-hf
mistral-7b-instruct	AI-ModelScope/Mistral-7B-Instruct-v0.1	q_proj, k_proj, v_proj	llama	✔	✔	transformers>=4.34	-	mistralai/Mistral-7B-Instruct-v0.1
mistral-7b-instruct-v2	AI-ModelScope/Mistral-7B-Instruct-v0.2	q_proj, k_proj, v_proj	llama	✔	✔	transformers>=4.34	-	mistralai/Mistral-7B-Instruct-v0.2
mixtral-moe-7b	AI-ModelScope/Mixtral-8x7B-v0.1	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.36	-	mistralai/Mixtral-8x7B-v0.1
mixtral-moe-7b-instruct	AI-ModelScope/Mixtral-8x7B-Instruct-v0.1	q_proj, k_proj, v_proj	llama	✔	✔	transformers>=4.36	-	mistralai/Mixtral-8x7B-Instruct-v0.1
mixtral-moe-7b-aqlm-2bit-1x16	AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf	q_proj, k_proj, v_proj	default-generation	✔	✘	transformers>=4.38, aqlm, torch>=2.2.0	-	ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf
mixtral-moe-8x22b-v1	AI-ModelScope/Mixtral-8x22B-v0.1	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.36	-	mistral-community/Mixtral-8x22B-v0.1
wizardlm2-7b-awq	AI-ModelScope/WizardLM-2-7B-AWQ	q_proj, k_proj, v_proj	wizardlm2-awq	✔	✔	transformers>=4.34	-	MaziyarPanahi/WizardLM-2-7B-AWQ
wizardlm2-8x22b	AI-ModelScope/WizardLM-2-8x22B	q_proj, k_proj, v_proj	wizardlm2	✔	✔	transformers>=4.36	-	alpindale/WizardLM-2-8x22B
baichuan-7b	baichuan-inc/baichuan-7B	W_pack	default-generation	✘	✔	transformers<4.34	-	baichuan-inc/Baichuan-7B
baichuan-13b	baichuan-inc/Baichuan-13B-Base	W_pack	default-generation	✘	✔	transformers<4.34	-	baichuan-inc/Baichuan-13B-Base
baichuan-13b-chat	baichuan-inc/Baichuan-13B-Chat	W_pack	baichuan	✘	✔	transformers<4.34	-	baichuan-inc/Baichuan-13B-Chat
baichuan2-7b	baichuan-inc/Baichuan2-7B-Base	W_pack	default-generation	✘	✔		-	baichuan-inc/Baichuan2-7B-Base
baichuan2-7b-chat	baichuan-inc/Baichuan2-7B-Chat	W_pack	baichuan	✘	✔		-	baichuan-inc/Baichuan2-7B-Chat
baichuan2-7b-chat-int4	baichuan-inc/Baichuan2-7B-Chat-4bits	W_pack	baichuan	✘	✘	bitsandbytes<0.41.2, accelerate<0.26	-	baichuan-inc/Baichuan2-7B-Chat-4bits
baichuan2-13b	baichuan-inc/Baichuan2-13B-Base	W_pack	default-generation	✘	✔		-	baichuan-inc/Baichuan2-13B-Base
baichuan2-13b-chat	baichuan-inc/Baichuan2-13B-Chat	W_pack	baichuan	✘	✔		-	baichuan-inc/Baichuan2-13B-Chat
baichuan2-13b-chat-int4	baichuan-inc/Baichuan2-13B-Chat-4bits	W_pack	baichuan	✘	✘	bitsandbytes<0.41.2, accelerate<0.26	-	baichuan-inc/Baichuan2-13B-Chat-4bits
yuan2-2b-instruct	YuanLLM/Yuan2.0-2B-hf	q_proj, k_proj, v_proj	yuan	✔	✘		-	IEITYuan/Yuan2-2B-hf
yuan2-2b-janus-instruct	YuanLLM/Yuan2-2B-Janus-hf	q_proj, k_proj, v_proj	yuan	✔	✘		-	IEITYuan/Yuan2-2B-Janus-hf
yuan2-51b-instruct	YuanLLM/Yuan2.0-51B-hf	q_proj, k_proj, v_proj	yuan	✔	✘		-	IEITYuan/Yuan2-51B-hf
yuan2-102b-instruct	YuanLLM/Yuan2.0-102B-hf	q_proj, k_proj, v_proj	yuan	✔	✘		-	IEITYuan/Yuan2-102B-hf
yuan2-m32	YuanLLM/Yuan2-M32-hf	q_proj, k_proj, v_proj	yuan	✔	✘		-	IEITYuan/Yuan2-M32-hf
xverse-7b	xverse/XVERSE-7B	q_proj, k_proj, v_proj	default-generation	✘	✔		-	xverse/XVERSE-7B
xverse-7b-chat	xverse/XVERSE-7B-Chat	q_proj, k_proj, v_proj	xverse	✘	✔		-	xverse/XVERSE-7B-Chat
xverse-13b	xverse/XVERSE-13B	q_proj, k_proj, v_proj	default-generation	✘	✔		-	xverse/XVERSE-13B
xverse-13b-chat	xverse/XVERSE-13B-Chat	q_proj, k_proj, v_proj	xverse	✘	✔		-	xverse/XVERSE-13B-Chat
xverse-65b	xverse/XVERSE-65B	q_proj, k_proj, v_proj	default-generation	✘	✔		-	xverse/XVERSE-65B
xverse-65b-v2	xverse/XVERSE-65B-2	q_proj, k_proj, v_proj	default-generation	✘	✔		-	xverse/XVERSE-65B-2
xverse-65b-chat	xverse/XVERSE-65B-Chat	q_proj, k_proj, v_proj	xverse	✘	✔		-	xverse/XVERSE-65B-Chat
xverse-13b-256k	xverse/XVERSE-13B-256K	q_proj, k_proj, v_proj	default-generation	✘	✔		-	xverse/XVERSE-13B-256K
xverse-moe-a4_2b	xverse/XVERSE-MoE-A4.2B	q_proj, k_proj, v_proj	default-generation	✘	✘		-	xverse/XVERSE-MoE-A4.2B
orion-14b	OrionStarAI/Orion-14B-Base	q_proj, k_proj, v_proj	default-generation	✔	✘		-	OrionStarAI/Orion-14B-Base
orion-14b-chat	OrionStarAI/Orion-14B-Chat	q_proj, k_proj, v_proj	orion	✔	✘		-	OrionStarAI/Orion-14B-Chat
bluelm-7b	vivo-ai/BlueLM-7B-Base	q_proj, k_proj, v_proj	default-generation	✘	✘		-	vivo-ai/BlueLM-7B-Base
bluelm-7b-32k	vivo-ai/BlueLM-7B-Base-32K	q_proj, k_proj, v_proj	default-generation	✘	✘		-	vivo-ai/BlueLM-7B-Base-32K
bluelm-7b-chat	vivo-ai/BlueLM-7B-Chat	q_proj, k_proj, v_proj	bluelm	✘	✘		-	vivo-ai/BlueLM-7B-Chat
bluelm-7b-chat-32k	vivo-ai/BlueLM-7B-Chat-32K	q_proj, k_proj, v_proj	bluelm	✘	✘		-	vivo-ai/BlueLM-7B-Chat-32K
ziya2-13b	Fengshenbang/Ziya2-13B-Base	q_proj, k_proj, v_proj	default-generation	✔	✔		-	IDEA-CCNL/Ziya2-13B-Base
ziya2-13b-chat	Fengshenbang/Ziya2-13B-Chat	q_proj, k_proj, v_proj	ziya	✔	✔		-	IDEA-CCNL/Ziya2-13B-Chat
skywork-13b	skywork/Skywork-13B-base	q_proj, k_proj, v_proj	default-generation	✘	✘		-	Skywork/Skywork-13B-base
skywork-13b-chat	skywork/Skywork-13B-chat	q_proj, k_proj, v_proj	skywork	✘	✘		-	-
zephyr-7b-beta-chat	modelscope/zephyr-7b-beta	q_proj, k_proj, v_proj	zephyr	✔	✔	transformers>=4.34	-	HuggingFaceH4/zephyr-7b-beta
polylm-13b	damo/nlp_polylm_13b_text_generation	c_attn	default-generation	✘	✘		-	DAMO-NLP-MT/polylm-13b
seqgpt-560m	damo/nlp_seqgpt-560m	query_key_value	default-generation	✘	✔		-	DAMO-NLP/SeqGPT-560M
sus-34b-chat	SUSTC/SUS-Chat-34B	q_proj, k_proj, v_proj	sus	✔	✔		-	SUSTech/SUS-Chat-34B
tongyi-finance-14b	TongyiFinance/Tongyi-Finance-14B	c_attn	default-generation	✔	✔		financial	-
tongyi-finance-14b-chat	TongyiFinance/Tongyi-Finance-14B-Chat	c_attn	qwen	✔	✔		financial	jxy/Tongyi-Finance-14B-Chat
tongyi-finance-14b-chat-int4	TongyiFinance/Tongyi-Finance-14B-Chat-Int4	c_attn	qwen	✔	✔	auto_gptq>=0.5	financial	jxy/Tongyi-Finance-14B-Chat-Int4
codefuse-codellama-34b-chat	codefuse-ai/CodeFuse-CodeLlama-34B	q_proj, k_proj, v_proj	codefuse-codellama	✔	✔		coding	codefuse-ai/CodeFuse-CodeLlama-34B
codefuse-codegeex2-6b-chat	codefuse-ai/CodeFuse-CodeGeeX2-6B	query_key_value	codefuse	✘	✔	transformers<4.34	coding	codefuse-ai/CodeFuse-CodeGeeX2-6B
codefuse-qwen-14b-chat	codefuse-ai/CodeFuse-QWen-14B	c_attn	codefuse	✔	✔		coding	codefuse-ai/CodeFuse-QWen-14B
phi2-3b	AI-ModelScope/phi-2	Wqkv	default-generation	✔	✔		coding	microsoft/phi-2
phi3-4b-4k-instruct	LLM-Research/Phi-3-mini-4k-instruct	qkv_proj	phi3	✔	✘	transformers>=4.36	general	microsoft/Phi-3-mini-4k-instruct
phi3-4b-128k-instruct	LLM-Research/Phi-3-mini-128k-instruct	qkv_proj	phi3	✔	✔	transformers>=4.36	general	microsoft/Phi-3-mini-128k-instruct
phi3-small-128k-instruct	LLM-Research/Phi-3-small-128k-instruct	query_key_value	phi3	✔	✔	transformers>=4.36	general	microsoft/Phi-3-small-128k-instruct
phi3-medium-128k-instruct	LLM-Research/Phi-3-medium-128k-instruct	qkv_proj	phi3	✔	✔	transformers>=4.36	general	microsoft/Phi-3-medium-128k-instruct
mamba-130m	AI-ModelScope/mamba-130m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-130m-hf
mamba-370m	AI-ModelScope/mamba-370m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-370m-hf
mamba-390m	AI-ModelScope/mamba-390m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-390m-hf
mamba-790m	AI-ModelScope/mamba-790m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-790m-hf
mamba-1.4b	AI-ModelScope/mamba-1.4b-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-1.4b-hf
mamba-2.8b	AI-ModelScope/mamba-2.8b-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-2.8b-hf
telechat-7b	TeleAI/TeleChat-7B	key_value, query	telechat	✔	✘		-	Tele-AI/telechat-7B
telechat-12b	TeleAI/TeleChat-12B	key_value, query	telechat	✔	✘		-	Tele-AI/TeleChat-12B
telechat-12b-v2	TeleAI/TeleChat-12B-v2	key_value, query	telechat-v2	✔	✘		-	Tele-AI/TeleChat-12B-v2
telechat-12b-v2-gptq-int4	swift/TeleChat-12B-V2-GPTQ-Int4	key_value, query	telechat-v2	✔	✘	auto_gptq>=0.5	-	-
grok-1	colossalai/grok-1-pytorch	q_proj, k_proj, v_proj	default-generation	✘	✘		-	hpcai-tech/grok-1
dbrx-instruct	AI-ModelScope/dbrx-instruct	attn.Wqkv	dbrx	✔	✔	transformers>=4.36	-	databricks/dbrx-instruct
dbrx-base	AI-ModelScope/dbrx-base	attn.Wqkv	dbrx	✔	✔	transformers>=4.36	-	databricks/dbrx-base
mengzi3-13b-base	langboat/Mengzi3-13B-Base	q_proj, k_proj, v_proj	mengzi	✔	✔		-	Langboat/Mengzi3-13B-Base
c4ai-command-r-v01	AI-ModelScope/c4ai-command-r-v01	q_proj, k_proj, v_proj	c4ai	✔	✘	transformers>=4.39.1	-	CohereForAI/c4ai-command-r-v01
c4ai-command-r-plus	AI-ModelScope/c4ai-command-r-plus	q_proj, k_proj, v_proj	c4ai	✔	✘	transformers>4.39	-	CohereForAI/c4ai-command-r-plus
codestral-22b	huangjintao/Codestral-22B-v0.1	q_proj, k_proj, v_proj	default-generation	✔	✔	transformers>=4.34	-	mistralai/Codestral-22B-v0.1

MLLM

Model Type	Model ID	Default Lora Target Modules	Default Template	Support Flash Attn	Support VLLM	Requires	Tags	HF Model ID
qwen-vl	qwen/Qwen-VL	c_attn	default-generation	✔	✘		vision	Qwen/Qwen-VL
qwen-vl-chat	qwen/Qwen-VL-Chat	c_attn	qwenvl	✔	✘		vision	Qwen/Qwen-VL-Chat
qwen-vl-chat-int4	qwen/Qwen-VL-Chat-Int4	c_attn	qwen	✔	✘	auto_gptq>=0.5	vision	Qwen/Qwen-VL-Chat-Int4
qwen-audio	qwen/Qwen-Audio	c_attn	qwen-audio-generation	✔	✘		audio	Qwen/Qwen-Audio
qwen-audio-chat	qwen/Qwen-Audio-Chat	c_attn	qwen-audio	✔	✘		audio	Qwen/Qwen-Audio-Chat
glm4v-9b-chat	ZhipuAI/glm-4v-9b	self_attention.query_key_value	glm4v	✘	✘		vision	THUDM/glm-4v-9b
llava1_5-7b-chat	huangjintao/llava-1.5-7b-hf	q_proj, k_proj, v_proj	llava1_5	✔	✘	transformers>=4.36	vision	llava-hf/llava-1.5-7b-hf
llava1_6-mistral-7b-instruct	AI-ModelScope/llava-v1.6-mistral-7b	q_proj, k_proj, v_proj	llava-mistral-instruct	✔	✘	transformers>=4.34	vision	liuhaotian/llava-v1.6-mistral-7b
llava1_6-yi-34b-instruct	AI-ModelScope/llava-v1.6-34b	q_proj, k_proj, v_proj	llava-yi-instruct	✔	✘		vision	liuhaotian/llava-v1.6-34b
llama3-llava-next-8b	AI-Modelscope/llama3-llava-next-8b	q_proj, k_proj, v_proj	llama-llava-next	✔	✘		vision	lmms-lab/llama3-llava-next-8b
llava-next-72b	AI-Modelscope/llava-next-72b	q_proj, k_proj, v_proj	llava-qwen-instruct	✔	✘		vision	lmms-lab/llava-next-72b
llava-next-110b	AI-Modelscope/llava-next-110b	q_proj, k_proj, v_proj	llava-qwen-instruct	✔	✘		vision	lmms-lab/llava-next-110b
yi-vl-6b-chat	01ai/Yi-VL-6B	q_proj, k_proj, v_proj	yi-vl	✔	✘	transformers>=4.34	vision	01-ai/Yi-VL-6B
yi-vl-34b-chat	01ai/Yi-VL-34B	q_proj, k_proj, v_proj	yi-vl	✔	✘	transformers>=4.34	vision	01-ai/Yi-VL-34B
llava-llama-3-8b-v1_1	AI-ModelScope/llava-llama-3-8b-v1_1-transformers	q_proj, k_proj, v_proj	llava-llama-instruct	✔	✘	transformers>=4.36	vision	xtuner/llava-llama-3-8b-v1_1-transformers
internlm-xcomposer2-7b-chat	Shanghai_AI_Laboratory/internlm-xcomposer2-7b	wqkv	internlm-xcomposer2	✔	✘		vision	internlm/internlm-xcomposer2-7b
internvl-chat-v1_5	AI-ModelScope/InternVL-Chat-V1-5	wqkv	internvl	✔	✘	transformers>=4.35, timm	vision	OpenGVLab/InternVL-Chat-V1-5
internvl-chat-v1_5-int8	AI-ModelScope/InternVL-Chat-V1-5-int8	wqkv	internvl	✔	✘	transformers>=4.35, timm	vision	OpenGVLab/InternVL-Chat-V1-5-int8
mini-internvl-chat-2b-v1_5	OpenGVLab/Mini-InternVL-Chat-2B-V1-5	wqkv	internvl	✔	✘	transformers>=4.35, timm	vision	OpenGVLab/Mini-InternVL-Chat-2B-V1-5
mini-internvl-chat-4b-v1_5	OpenGVLab/Mini-InternVL-Chat-4B-V1-5	qkv_proj	internvl-phi3	✔	✘	transformers>=4.35, timm	vision	OpenGVLab/Mini-InternVL-Chat-4B-V1-5
deepseek-vl-1_3b-chat	deepseek-ai/deepseek-vl-1.3b-chat	q_proj, k_proj, v_proj	deepseek-vl	✔	✘	attrdict	vision	deepseek-ai/deepseek-vl-1.3b-chat
deepseek-vl-7b-chat	deepseek-ai/deepseek-vl-7b-chat	q_proj, k_proj, v_proj	deepseek-vl	✔	✘	attrdict	vision	deepseek-ai/deepseek-vl-7b-chat
paligemma-3b-pt-224	AI-ModelScope/paligemma-3b-pt-224	q_proj, k_proj, v_proj	paligemma	✔	✘	transformers>=4.41	vision	google/paligemma-3b-pt-224
paligemma-3b-pt-448	AI-ModelScope/paligemma-3b-pt-448	q_proj, k_proj, v_proj	paligemma	✔	✘	transformers>=4.41	vision	google/paligemma-3b-pt-448
paligemma-3b-pt-896	AI-ModelScope/paligemma-3b-pt-896	q_proj, k_proj, v_proj	paligemma	✔	✘	transformers>=4.41	vision	google/paligemma-3b-pt-896
paligemma-3b-mix-224	AI-ModelScope/paligemma-3b-mix-224	q_proj, k_proj, v_proj	paligemma	✔	✘	transformers>=4.41	vision	google/paligemma-3b-mix-224
paligemma-3b-mix-448	AI-ModelScope/paligemma-3b-mix-448	q_proj, k_proj, v_proj	paligemma	✔	✘	transformers>=4.41	vision	google/paligemma-3b-mix-448
minicpm-v-3b-chat	OpenBMB/MiniCPM-V	q_proj, k_proj, v_proj	minicpm-v	✔	✘		vision	openbmb/MiniCPM-V
minicpm-v-v2-chat	OpenBMB/MiniCPM-V-2	q_proj, k_proj, v_proj	minicpm-v	✔	✘	timm	vision	openbmb/MiniCPM-V-2
minicpm-v-v2_5-chat	OpenBMB/MiniCPM-Llama3-V-2_5	q_proj, k_proj, v_proj	minicpm-v-v2_5	✔	✘	timm	vision	openbmb/MiniCPM-Llama3-V-2_5
mplug-owl2-chat	iic/mPLUG-Owl2	q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1	mplug-owl2	✔	✘	transformers<4.35, icecream	vision	MAGAer13/mplug-owl2-llama2-7b
mplug-owl2_1-chat	iic/mPLUG-Owl2.1	c_attn.multiway.0, c_attn.multiway.1	mplug-owl2	✔	✘	transformers<4.35, icecream	vision	Mizukiluke/mplug_owl_2_1
phi3-vision-128k-instruct	LLM-Research/Phi-3-vision-128k-instruct	qkv_proj	phi3-vl	✔	✘	transformers>=4.36	vision	microsoft/Phi-3-vision-128k-instruct
cogvlm-17b-chat	ZhipuAI/cogvlm-chat	vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense	cogvlm	✘	✘		vision	THUDM/cogvlm-chat-hf
cogvlm2-19b-chat	ZhipuAI/cogvlm2-llama3-chinese-chat-19B	vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense	cogvlm	✘	✘		vision	THUDM/cogvlm2-llama3-chinese-chat-19B
cogvlm2-en-19b-chat	ZhipuAI/cogvlm2-llama3-chat-19B	vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense	cogvlm	✘	✘		vision	THUDM/cogvlm2-llama3-chat-19B
cogagent-18b-chat	ZhipuAI/cogagent-chat	vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense	cogagent-chat	✘	✘	timm	vision	THUDM/cogagent-chat-hf
cogagent-18b-instruct	ZhipuAI/cogagent-vqa	vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense	cogagent-instruct	✘	✘	timm	vision	THUDM/cogagent-vqa-hf

Datasets

The table below introduces the datasets supported by SWIFT:

Dataset Name: The dataset name registered in SWIFT.
Dataset ID: The dataset id in ModelScope.
Size: The data row count of the dataset.
Statistic: Dataset statistics. We use the number of tokens for statistics, which helps adjust the max_length hyperparameter. We concatenate the training and validation sets of the dataset and then compute the statistics. We use qwen's tokenizer to tokenize the dataset. Different tokenizers produce different statistics. If you want to obtain token statistics for tokenizers of other models, you can use the script to get them yourself.

Dataset Name	Dataset ID	Subsets	Dataset Size	Statistic (token)	Tags	HF Dataset ID
🔥ms-bench	iic/ms_bench		316820	346.9±443.2, min=22, max=30960	chat, general, multi-round	-
🔥alpaca-en	AI-ModelScope/alpaca-gpt4-data-en		52002	176.2±125.8, min=26, max=740	chat, general	vicgalle/alpaca-gpt4
🔥alpaca-zh	AI-ModelScope/alpaca-gpt4-data-zh		48818	162.1±93.9, min=26, max=856	chat, general	llm-wizard/alpaca-gpt4-data-zh
multi-alpaca	damo/nlp_polylm_multialpaca_sft	ar de es fr id ja ko pt ru th vi	131867	112.9±50.6, min=26, max=1226	chat, general, multilingual	-
instinwild	wyj123456/instinwild	default subset	103695	145.4±60.7, min=28, max=1434	-	-
cot-en	YorickHe/CoT		74771	122.7±64.8, min=51, max=8320	chat, general	-
cot-zh	YorickHe/CoT_zh		74771	117.5±70.8, min=43, max=9636	chat, general	-
instruct-en	wyj123456/instruct		888970	269.1±331.5, min=26, max=7254	chat, general	-
firefly-zh	AI-ModelScope/firefly-train-1.1M		1649399	178.1±260.4, min=26, max=12516	chat, general	YeungNLP/firefly-train-1.1M
gpt4all-en	wyj123456/GPT4all		806199	302.7±384.5, min=27, max=7391	chat, general	-
sharegpt	huangjintao/sharegpt	common-zh computer-zh unknow-zh common-en computer-en	96566	933.3±864.8, min=21, max=66412	chat, general, multi-round	-
tulu-v2-sft-mixture	AI-ModelScope/tulu-v2-sft-mixture		5119	520.7±437.6, min=68, max=2549	chat, multilingual, general, multi-round	allenai/tulu-v2-sft-mixture
wikipedia-zh	AI-ModelScope/wikipedia-cn-20230720-filtered		254547	568.4±713.2, min=37, max=78678	text-generation, general, pretrained	pleisto/wikipedia-cn-20230720-filtered
open-orca	AI-ModelScope/OpenOrca		994896	382.3±417.4, min=31, max=8740	chat, multilingual, general	-
🔥sharegpt-gpt4	AI-ModelScope/sharegpt_gpt4	default V3_format zh_38K_format	72684	1047.6±1313.1, min=22, max=66412	chat, multilingual, general, multi-round, gpt4	-
deepctrl-sft	AI-ModelScope/deepctrl-sft-data	default en	14149024	389.8±628.6, min=21, max=626237	chat, general, sft, multi-round	-
🔥coig-cqia	AI-ModelScope/COIG-CQIA	chinese_traditional coig_pc exam finance douban human_value logi_qa ruozhiba segmentfault wiki wikihow xhs zhihu	44694	703.8±654.2, min=33, max=19288	general	-
🔥ruozhiba	AI-ModelScope/ruozhiba	post-annual title-good title-norm	85658	39.9±13.1, min=21, max=559	pretrain	-
long-alpaca-12k	AI-ModelScope/LongAlpaca-12k		11998	9619.0±8295.8, min=36, max=78925	longlora, QA	Yukang/LongAlpaca-12k
🔥ms-agent	iic/ms_agent		26336	650.9±217.2, min=209, max=2740	chat, agent, multi-round	-
🔥ms-agent-for-agentfabric	AI-ModelScope/ms_agent_for_agentfabric	default addition	30000	617.8±199.1, min=251, max=2657	chat, agent, multi-round	-
ms-agent-multirole	iic/MSAgent-MultiRole		9500	447.6±84.9, min=145, max=1101	chat, agent, multi-round, role-play, multi-agent	-
🔥toolbench-for-alpha-umi	shenweizhou/alpha-umi-toolbench-processed-v2	backbone caller planner summarizer	1448337	1439.7±853.9, min=123, max=18467	chat, agent	-
damo-agent-zh	damo/MSAgent-Bench		386984	956.5±407.3, min=326, max=19001	chat, agent, multi-round	-
damo-agent-zh-mini	damo/MSAgent-Bench		20845	1326.4±329.6, min=571, max=4304	chat, agent, multi-round	-
agent-instruct-all-en	huangjintao/AgentInstruct_copy	alfworld db kg mind2web os webshop	1866	1144.3±635.5, min=206, max=6412	chat, agent, multi-round	-
code-alpaca-en	wyj123456/code_alpaca_en		20016	100.2±60.1, min=29, max=1776	-	sahil2801/CodeAlpaca-20k
🔥leetcode-python-en	AI-ModelScope/leetcode-solutions-python		2359	727.1±235.9, min=259, max=2146	chat, coding	-
🔥codefuse-python-en	codefuse-ai/CodeExercise-Python-27k		27224	483.6±193.9, min=45, max=3082	chat, coding	-
🔥codefuse-evol-instruction-zh	codefuse-ai/Evol-instruction-66k		66862	439.6±206.3, min=37, max=2983	chat, coding	-
medical-en	huangjintao/medical_zh	en	117617	257.4±89.1, min=36, max=2564	chat, medical	-
medical-zh	huangjintao/medical_zh	zh	1950972	167.2±219.7, min=26, max=27351	chat, medical	-
🔥disc-med-sft-zh	AI-ModelScope/DISC-Med-SFT		441767	354.1±193.1, min=25, max=2231	chat, medical	Flmc/DISC-Med-SFT
lawyer-llama-zh	AI-ModelScope/lawyer_llama_data		21476	194.4±91.7, min=27, max=924	chat, law	Skepsun/lawyer_llama_data
tigerbot-law-zh	AI-ModelScope/tigerbot-law-plugin		55895	109.9±126.4, min=37, max=18878	text-generation, law, pretrained	TigerResearch/tigerbot-law-plugin
🔥disc-law-sft-zh	AI-ModelScope/DISC-Law-SFT		166758	533.7±495.4, min=30, max=15169	chat, law	ShengbinYue/DISC-Law-SFT
🔥blossom-math-zh	AI-ModelScope/blossom-math-v2		10000	169.3±58.7, min=35, max=563	chat, math	Azure99/blossom-math-v2
school-math-zh	AI-ModelScope/school_math_0.25M		248480	157.7±72.2, min=33, max=3450	chat, math, quality	BelleGroup/school_math_0.25M
open-platypus-en	AI-ModelScope/Open-Platypus		24926	367.9±254.8, min=30, max=3951	chat, math, quality	garage-bAInd/Open-Platypus
text2sql-en	AI-ModelScope/texttosqlv2_25000_v2		25000	274.6±326.4, min=38, max=1975	chat, sql	Clinton/texttosqlv2_25000_v2
🔥sql-create-context-en	AI-ModelScope/sql-create-context		78577	80.2±17.8, min=36, max=456	chat, sql	b-mc2/sql-create-context
synthetic-text-to-sql	AI-ModelScope/synthetic_text_to_sql	default	100000	283.4±115.8, min=61, max=1356	nl2sql, en	gretelai/synthetic_text_to_sql
🔥advertise-gen-zh	lvjianjin/AdvertiseGen		98399	130.6±21.7, min=51, max=241	text-generation	shibing624/AdvertiseGen
🔥dureader-robust-zh	modelscope/DuReader_robust-QG		17899	241.1±137.4, min=60, max=1416	text-generation	-
cmnli-zh	modelscope/clue	cmnli	404024	82.6±16.6, min=51, max=199	text-generation, classification	clue
🔥jd-sentiment-zh	DAMO_NLP/jd		50000	66.0±83.2, min=39, max=4039	text-generation, classification	-
🔥hc3-zh	simpleai/HC3-Chinese	baike open_qa nlpcc_dbqa finance medicine law psychology	39781	176.8±81.5, min=57, max=3051	text-generation, classification	Hello-SimpleAI/HC3-Chinese
🔥hc3-en	simpleai/HC3	finance medicine	11021	298.3±138.7, min=65, max=2267	text-generation, classification	Hello-SimpleAI/HC3
dolly-15k	AI-ModelScope/databricks-dolly-15k	default	15011	199.2±267.8, min=22, max=8615	multi-task, en, quality	databricks/databricks-dolly-15k
finance-en	wyj123456/finance_en		68911	135.6±134.3, min=26, max=3525	chat, financial	ssbuild/alpaca_finance_en
poetry-zh	modelscope/chinese-poetry-collection		390309	55.2±9.4, min=23, max=83	text-generation, poetry	-
webnovel-zh	AI-ModelScope/webnovel_cn		50000	1478.9±11526.1, min=100, max=490484	chat, novel	zxbsmk/webnovel_cn
generated-chat-zh	AI-ModelScope/generated_chat_0.4M		396004	273.3±52.0, min=32, max=873	chat, character-dialogue	BelleGroup/generated_chat_0.4M
🔥self-cognition	swift/self-cognition		134	53.6±18.6, min=29, max=121	chat, self-cognition	modelscope/self-cognition
cls-fudan-news-zh	damo/zh_cls_fudan-news		4959	3234.4±2547.5, min=91, max=19548	chat, classification	-
ner-jave-zh	damo/zh_ner-JAVE		1266	118.3±45.5, min=44, max=223	chat, ner	-
coco-en	modelscope/coco_2014_caption	coco_2014_caption	454617	299.8±2.8, min=295, max=352	chat, multi-modal, vision	-
🔥coco-en-mini	modelscope/coco_2014_caption	coco_2014_caption	40504	299.8±2.6, min=295, max=338	chat, multi-modal, vision	-
coco-en-2	modelscope/coco_2014_caption	coco_2014_caption	454617	36.8±2.8, min=32, max=89	chat, multi-modal, vision	-
🔥coco-en-2-mini	modelscope/coco_2014_caption	coco_2014_caption	40504	36.8±2.6, min=32, max=75	chat, multi-modal, vision	-
capcha-images	AI-ModelScope/captcha-images		8000	31.0±0.0, min=31, max=31	chat, multi-modal, vision	-
aishell1-zh	speech_asr/speech_asr_aishell1_trainsets		141600	152.2±36.8, min=63, max=419	chat, multi-modal, audio	-
🔥aishell1-zh-mini	speech_asr/speech_asr_aishell1_trainsets		14526	152.2±35.6, min=74, max=359	chat, multi-modal, audio	-
hh-rlhf	AI-ModelScope/hh-rlhf	harmless-base helpful-base helpful-online helpful-rejection-sampled	127459	245.4±190.7, min=22, max=1999	rlhf, dpo, pairwise	-
🔥hh-rlhf-cn	AI-ModelScope/hh_rlhf_cn	hh_rlhf harmless_base_cn harmless_base_en helpful_base_cn helpful_base_en	355920	171.2±122.7, min=22, max=3078	rlhf, dpo, pairwise	-
orpo-dpo-mix-40k	AI-ModelScope/orpo-dpo-mix-40k	default	43666	548.3±397.4, min=28, max=8483	dpo, orpo, en, quality	mlabonne/orpo-dpo-mix-40k
stack-exchange-paired	AI-ModelScope/stack-exchange-paired		4483004	534.5±594.6, min=31, max=56588	hfrl, dpo, pairwise	lvwerra/stack-exchange-paired
shareai-llama3-dpo-zh-en-emoji	hjh0119/shareAI-Llama3-DPO-zh-en-emoji	default	2449	334.0±162.8, min=36, max=1801	rlhf, dpo, pairwise	-
pileval	huangjintao/pile-val-backup		214670	1612.3±8856.2, min=11, max=1208955	text-generation, awq	mit-han-lab/pile-val-backup
mantis-instruct	swift/Mantis-Instruct	birds-to-words chartqa coinstruct contrastive_caption docvqa dreamsim dvqa iconqa imagecode llava_665k_multi lrv_multi multi_vqa nextqa nlvr2 spot-the-diff star visual_story_telling	655351	825.7±812.5, min=284, max=13563	chat, multi-modal, vision, quality	TIGER-Lab/Mantis-Instruct
llava-data-instruct	swift/llava-data	llava_instruct	364100	189.0±142.1, min=33, max=5183	sft, multi-modal, quality	TIGER-Lab/llava-data
midefics	swift/MideficsDataset		3800	201.3±70.2, min=60, max=454	medical, en, vqa	WinterSchool/MideficsDataset
gqa	None	train_all_instructions	-	Dataset is too huge, please click the original link to view the dataset stat.	multi-modal, en, vqa, quality	lmms-lab/GQA
text-caps	swift/TextCaps		18145	38.2±4.4, min=31, max=73	multi-modal, en, caption, quality	HuggingFaceM4/TextCaps
a-okvqa	swift/A-OKVQA		18201	45.8±7.9, min=32, max=100	multi-modal, en, vqa, quality	HuggingFaceM4/A-OKVQA
okvqa	swift/OK-VQA_train		9009	34.4±3.3, min=28, max=59	multi-modal, en, vqa, quality	Multimodal-Fatima/OK-VQA_train
ocr-vqa	swift/OCR-VQA		186753	35.6±6.6, min=29, max=193	multi-modal, en, ocr-vqa	howard-hou/OCR-VQA
grit	swift/GRIT		-	Dataset is too huge, please click the original link to view the dataset stat.	multi-modal, en, caption-grounding, quality	zzliang/GRIT
llava-instruct-mix	swift/llava-instruct-mix-vsft		13640	179.8±120.2, min=30, max=962	multi-modal, en, vqa, quality	HuggingFaceH4/llava-instruct-mix-vsft
lnqa	swift/lnqa		-	Dataset is too huge, please click the original link to view the dataset stat.	multi-modal, en, ocr-vqa, quality	vikhyatk/lnqa
science-qa	swift/ScienceQA		8315	100.3±59.5, min=38, max=638	multi-modal, science, vqa, quality	derek-thomas/ScienceQA
guanaco	AI-ModelScope/GuanacoDataset	default	31561	250.1±70.3, min=89, max=1436	chat, zh	JosephusCheung/GuanacoDataset
mind2web	swift/Multimodal-Mind2Web		1009	297522.4±325496.2, min=8592, max=3499715	agent, multi-modal	osunlp/Multimodal-Mind2Web
sharegpt-4o-image	AI-ModelScope/ShareGPT-4o	image_caption	57289	638.7±157.9, min=47, max=4640	vqa, multi-modal	OpenGVLab/ShareGPT-4o
m3it	AI-ModelScope/M3IT	coco vqa-v2 shapes shapes-rephrased coco-goi-rephrased snli-ve snli-ve-rephrased okvqa a-okvqa viquae textcap docvqa science-qa imagenet imagenet-open-ended imagenet-rephrased coco-goi clevr clevr-rephrased nlvr coco-itm coco-itm-rephrased vsr vsr-rephrased mocheg mocheg-rephrased coco-text fm-iqa activitynet-qa msrvtt ss coco-cn refcoco refcoco-rephrased multi30k image-paragraph-captioning visual-dialog visual-dialog-rephrased iqa vcr visual-mrc ivqa msrvtt-qa msvd-qa gqa text-vqa ocr-vqa st-vqa flickr8k-cn	-	Dataset is too huge, please click the original link to view the dataset stat.	chat, multi-modal, vision	-
sharegpt4v	AI-ModelScope/ShareGPT4V	ShareGPT4V ShareGPT4V-PT	-	Dataset is too huge, please click the original link to view the dataset stat.	chat, multi-modal, vision	-
llava-instruct-150k	AI-ModelScope/LLaVA-Instruct-150K		624610	490.4±180.2, min=288, max=5438	chat, multi-modal, vision	-
llava-pretrain	AI-ModelScope/LLaVA-Pretrain	blip_laion_cc_sbu_558k	-	Dataset is too huge, please click the original link to view the dataset stat.	vqa, multi-modal, quality	liuhaotian/LLaVA-Pretrain
RLAIF-v-dataset	swift/RLAIF-V-Dataset		83132	113.7±49.7, min=30, max=540	multi-modal, rlhf, quality	openbmb/RLAIF-V-Dataset
alpaca-cleaned	AI-ModelScope/alpaca-cleaned		51760	177.9±126.4, min=26, max=1044	chat, general, bench, quality	yahma/alpaca-cleaned
aya-collection	swift/aya_collection	aya_dataset	202364	494.0±6911.3, min=21, max=3044268	multi-lingual, qa	CohereForAI/aya_collection
belle-generated-chat-0.4M	AI-ModelScope/generated_chat_0.4M		396004	273.3±52.0, min=32, max=873	common, zh	BelleGroup/generated_chat_0.4M
belle-math-0.25M	AI-ModelScope/school_math_0.25M		248480	157.7±72.2, min=33, max=3450	math, zh	BelleGroup/school_math_0.25M
belle-train-0.5M-CN	AI-ModelScope/train_0.5M_CN		519255	129.1±91.5, min=27, max=6507	common, zh, quality	BelleGroup/train_0.5M_CN
belle-train-1M-CN	AI-ModelScope/train_1M_CN		-	Dataset is too huge, please click the original link to view the dataset stat.	common, zh, quality	BelleGroup/train_1M_CN
belle-train-2M-CN	AI-ModelScope/train_2M_CN		-	Dataset is too huge, please click the original link to view the dataset stat.	common, zh, quality	BelleGroup/train_2M_CN
belle-train-3.5M-CN	swift/train_3.5M_CN		-	Dataset is too huge, please click the original link to view the dataset stat.	common, zh, quality	BelleGroup/train_3.5M_CN
c4	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	allenai/c4
chart-qa	swift/ChartQA		28299	43.1±5.5, min=29, max=77	en, vqa, quality	HuggingFaceM4/ChartQA
chinese-c4	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, zh, quality	shjwudp/chinese-c4
cinepile	swift/cinepile		-	Dataset is too huge, please click the original link to view the dataset stat.	vqa, en, youtube, video	tomg-group-umd/cinepile
codealpaca-20k	AI-ModelScope/CodeAlpaca-20k		20016	100.2±60.1, min=29, max=1776	code, en	HuggingFaceH4/CodeAlpaca_20K
cosmopedia	None	auto_math_text khanacademy openstax stanford stories web_samples_v1 web_samples_v2 wikihow	-	Dataset is too huge, please click the original link to view the dataset stat.	multi-domain, en, qa	HuggingFaceTB/cosmopedia
cosmopedia-100k	swift/cosmopedia-100k		100000	1024.5±243.1, min=239, max=2981	multi-domain, en, qa	HuggingFaceTB/cosmopedia-100k
dolma	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	allenai/dolma
dolphin	swift/dolphin	flan1m-alpaca-uncensored flan5m-alpaca-uncensored	-	Dataset is too huge, please click the original link to view the dataset stat.	en	cognitivecomputations/dolphin
evol-instruct-v2	AI-ModelScope/WizardLM_evol_instruct_V2_196k		109184	480.9±333.1, min=26, max=4942	chat, en	WizardLM/WizardLM_evol_instruct_V2_196k
fineweb	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	HuggingFaceFW/fineweb
github-code	swift/github-code		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	codeparrot/github-code
gpt4v-dataset	swift/gpt4v-dataset		12356	217.9±68.3, min=35, max=596	en, caption, multi-modal, quality	laion/gpt4v-dataset
guanaco-belle-merge	AI-ModelScope/guanaco_belle_merge_v1.0		693987	134.2±92.0, min=24, max=6507	QA, zh	Chinese-Vicuna/guanaco_belle_merge_v1.0
llava-med-zh-instruct	swift/llava-med-zh-instruct-60k		56649	207.7±67.6, min=37, max=657	zh, medical, vqa	BUAADreamer/llava-med-zh-instruct-60k
lmsys-chat-1m	AI-ModelScope/lmsys-chat-1m		-	Dataset is too huge, please click the original link to view the dataset stat.	chat, en	lmsys/lmsys-chat-1m
math-instruct	AI-ModelScope/MathInstruct		262283	254.4±183.5, min=11, max=4383	math, cot, en, quality	TIGER-Lab/MathInstruct
math-plus	TIGER-Lab/MATH-plus	train	893929	287.1±158.7, min=24, max=2919	qa, math, en, quality	TIGER-Lab/MATH-plus
moondream2-coyo-5M	swift/moondream2-coyo-5M-captions		-	Dataset is too huge, please click the original link to view the dataset stat.	caption, pretrain, quality	isidentical/moondream2-coyo-5M-captions
no-robots	swift/no_robots		9485	298.7±246.4, min=40, max=6739	multi-task, quality, human-annotated	HuggingFaceH4/no_robots
open-hermes	swift/OpenHermes-2.5		-	Dataset is too huge, please click the original link to view the dataset stat.	cot, en, quality	teknium/OpenHermes-2.5
open-orca-chinese	AI-ModelScope/OpenOrca-Chinese		-	Dataset is too huge, please click the original link to view the dataset stat.	QA, zh, general, quality	yys/OpenOrca-Chinese
orca_dpo_pairs	swift/orca_dpo_pairs		12859	366.9±251.9, min=30, max=2010	rlhf, quality	Intel/orca_dpo_pairs
path-vqa	swift/path-vqa		19654	34.8±7.3, min=27, max=85	multi-modal, vqa, medical	flaviagiammarino/path-vqa
pile	AI-ModelScope/pile		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain	EleutherAI/pile
poison-mpts	iic/100PoisonMpts		906	150.6±80.8, min=39, max=656	poison-management, zh	-
redpajama-data-1t	swift/RedPajama-Data-1T		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	togethercomputer/RedPajama-Data-1T
redpajama-data-v2	swift/RedPajama-Data-V2		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	togethercomputer/RedPajama-Data-V2
refinedweb	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	tiiuae/falcon-refinedweb
rwkv-pretrain-web	mapjack/openwebtext_dataset		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, zh, quality	-
sft-nectar	AI-ModelScope/SFT-Nectar		131192	396.4±272.1, min=44, max=10732	cot, en, quality	AstraMindAI/SFT-Nectar
skypile	AI-ModelScope/SkyPile-150B		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality, zh	Skywork/SkyPile-150B
slim-orca	swift/SlimOrca		517982	399.1±370.2, min=35, max=8756	quality, en	Open-Orca/SlimOrca
slim-pajama-627b	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	cerebras/SlimPajama-627B
starcoder	AI-ModelScope/starcoderdata		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	bigcode/starcoderdata
tagengo-gpt4	swift/tagengo-gpt4		78057	472.3±292.9, min=22, max=3521	chat, multi-lingual, quality	lightblue/tagengo-gpt4
the-stack	AI-ModelScope/the-stack		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	bigcode/the-stack
ultrachat-200k	swift/ultrachat_200k		207865	1195.4±573.7, min=76, max=4470	chat, en, quality	HuggingFaceH4/ultrachat_200k
vqa-v2	swift/VQAv2		443757	31.8±2.2, min=27, max=58	en, vqa, quality	HuggingFaceM4/VQAv2
web-instruct-sub	swift/WebInstructSub		-	Dataset is too huge, please click the original link to view the dataset stat.	qa, en, math, quality, multi-domain, science	TIGER-Lab/WebInstructSub
wikipedia	swift/wikipedia		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	wikipedia
wikipedia-cn-filtered	AI-ModelScope/wikipedia-cn-20230720-filtered		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	pleisto/wikipedia-cn-20230720-filtered
zhihu-rlhf	AI-ModelScope/zhihu_rlhf_3k		3460	594.5±365.9, min=31, max=1716	rlhf, dpo, zh	liyucheng/zhihu_rlhf_3k

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supported-models-datasets.md

Supported-models-datasets.md

Supported models and datasets

Table of Contents

Models

LLM

MLLM

Datasets

Files

Supported-models-datasets.md

Latest commit

History

Supported-models-datasets.md

File metadata and controls

Supported models and datasets

Table of Contents

Models

LLM

MLLM

Datasets