DPO training of a supervised finetuned model #3997

bilalkakar01 · 2024-05-30T17:00:13Z

bilalkakar01
May 30, 2024

Hello,

I did a SFT and then I wanted to do dpo training on top of SFT model, so chose the my model and adapter (SFT model) and I chose dpo training. The problem is that non of the inputs have requires_grad=True so the gradient will be None. I don't know where do I make mistake and how to do it properly.

Thanks in advance

hiyouga · 2024-05-30T17:49:42Z

hiyouga
May 30, 2024
Maintainer

the warning message can be safely ignored

0 replies

bilalkakar01 · 2024-05-30T17:55:08Z

bilalkakar01
May 30, 2024
Author

Thank you very much.
Some more questions:
1- for doing dpo training on SFT model, do I need to create new adapter or it is not necessary?
2- When it saves the model, there is not config.json file, is there any thing to do to have this file as well?

Thanks again

0 replies

hiyouga · 2024-05-30T23:58:14Z

hiyouga
May 30, 2024
Maintainer

up to you
please use llamafactory-cli export to get the full model

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO training of a supervised finetuned model #3997

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

DPO training of a supervised finetuned model #3997

bilalkakar01 May 30, 2024

Replies: 3 comments

hiyouga May 30, 2024 Maintainer

bilalkakar01 May 30, 2024 Author

hiyouga May 30, 2024 Maintainer

bilalkakar01
May 30, 2024

hiyouga
May 30, 2024
Maintainer

bilalkakar01
May 30, 2024
Author

hiyouga
May 30, 2024
Maintainer