We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug 多卡数据并行lora微调了一个版本的MiniCPM-V,在测试的时候发现输出结果几乎跟原始没有微调的版本一样,损失函数有正常下降,但是在训练集的测试输出也仿佛是没有微调的版本; 怀疑是否是infer命令有问题呢?还请大佬帮忙看一下; P.S.单卡训练的模型可以输出符合预期的效果
训练命令: nproc_per_node=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 NPROC_PER_NODE=$nproc_per_node MASTER_PORT=29500 swift sft --model_type minicpm-v-v2-chat --dataset train_minicpm_v_2_0619.jsonl --lora_target_modules ALL --train_dataset_sample -1 --num_train_epochs 8 --ddp_find_unused_parameters True \
单卡训练命令: CUDA_VISIBLE_DEVICES=1 swift sft --model_type minicpm-v-v2-chat --dataset train_minicpm_v_2_0619.jsonl --lora_target_modules ALL
infer命令: CUDA_VISIBLE_DEVICES=1 swift export --ckpt_dir output/minicpm-v-v2-chat/v3-20240619-204718/checkpoint-6200/ --merge_lora true CUDA_VISIBLE_DEVICES=1 swift infer --ckpt_dir output/minicpm-v-v2-chat/v3-20240619-204718/checkpoint-6200-merged --load_dataset_config true --val_dataset val_minicpm_v_2_0619.jsonl --show_dataset_sample -1
The text was updated successfully, but these errors were encountered:
fixed #1197
Sorry, something went wrong.
需要重新训练下
你好,我用最新的commit版本去进行训练,出现了新的报错, 当前使用的版本如下
感觉训练 vision encoder部分就会有这个问题
Jintao-Huang
tastelikefeet
No branches or pull requests
Describe the bug
多卡数据并行lora微调了一个版本的MiniCPM-V,在测试的时候发现输出结果几乎跟原始没有微调的版本一样,损失函数有正常下降,但是在训练集的测试输出也仿佛是没有微调的版本;
怀疑是否是infer命令有问题呢?还请大佬帮忙看一下;
P.S.单卡训练的模型可以输出符合预期的效果
训练命令:
nproc_per_node=8
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=$nproc_per_node
MASTER_PORT=29500
swift sft
--model_type minicpm-v-v2-chat
--dataset train_minicpm_v_2_0619.jsonl
--lora_target_modules ALL
--train_dataset_sample -1
--num_train_epochs 8
--ddp_find_unused_parameters True \
单卡训练命令:
CUDA_VISIBLE_DEVICES=1 swift sft --model_type minicpm-v-v2-chat --dataset train_minicpm_v_2_0619.jsonl --lora_target_modules ALL
infer命令:
CUDA_VISIBLE_DEVICES=1 swift export --ckpt_dir output/minicpm-v-v2-chat/v3-20240619-204718/checkpoint-6200/ --merge_lora true
CUDA_VISIBLE_DEVICES=1 swift infer --ckpt_dir output/minicpm-v-v2-chat/v3-20240619-204718/checkpoint-6200-merged --load_dataset_config true --val_dataset val_minicpm_v_2_0619.jsonl --show_dataset_sample -1
The text was updated successfully, but these errors were encountered: