Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability? #25

hzgdeerHo · 2024-05-17T14:03:16Z

After finetuned the llama-3-8B-instruct with the same configuration ,as the code from:https://github.com/hiyouga/LLaMA-Factory/tree/3df986c6793a51ec2cb5f31fd1808cd3a9883bc4/examples/extrasexamples/extras/llama_pro always leads to apparent loss of original ability? I only used the train datasets "Identity". Can you help? THANKS

hzgdeerHo · 2024-05-17T14:05:38Z

The　final training loss is about 0.1-0.05 ,and I think it is might not be caused by overfitting ?

hills-code · 2024-05-18T02:53:30Z

Hi! Have you tried to directly finetune llama-3-8B-instruct? What will happen in this setting?
I did not carry out the experiments with llama-3 so maybe I am not very familiar with the feature of it. I think you can also try to change the position of the added blocks. Recent Yi-tech report and some llama3-120B models show that maybe fix the first few layers are important. Hope this will be helpful!

hzgdeerHo · 2024-05-18T03:58:51Z

OK，thanks! Could you show me some link as reference to figure out the problem?

hills-code · 2024-05-18T04:03:27Z

Certainly! Here is the link to Yi-9B https://huggingface.co/01-ai/Yi-9B and its tech report https://arxiv.org/pdf/2403.04652
You can find the depth upscaling in the Sec 7.3

and LLaMa3-120B https://huggingface.co/alpindale/goliath-120b

hzgdeerHo · 2024-05-18T04:05:08Z

Thanks !

hzgdeerHo · 2024-05-19T14:20:29Z

I have post this new issue :hiyouga/LLaMA-Factory#3811 . Would you please help to explain ? Thanks!

hiyouga · 2024-05-19T15:20:42Z

Using small datasets and large epochs in training can easily lead to overfitting.

hzgdeerHo · 2024-05-20T00:33:17Z

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability? #25

Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability? #25

hzgdeerHo commented May 17, 2024

hzgdeerHo commented May 17, 2024

hills-code commented May 18, 2024

hzgdeerHo commented May 18, 2024

hills-code commented May 18, 2024

hzgdeerHo commented May 18, 2024

hzgdeerHo commented May 19, 2024 •

edited

Loading

hiyouga commented May 19, 2024 •

edited

Loading

hzgdeerHo commented May 20, 2024

Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability? #25

Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability? #25

Comments

hzgdeerHo commented May 17, 2024

hzgdeerHo commented May 17, 2024

hills-code commented May 18, 2024

hzgdeerHo commented May 18, 2024

hills-code commented May 18, 2024

hzgdeerHo commented May 18, 2024

hzgdeerHo commented May 19, 2024 • edited Loading

hiyouga commented May 19, 2024 • edited Loading

hzgdeerHo commented May 20, 2024

hzgdeerHo commented May 19, 2024 •

edited

Loading

hiyouga commented May 19, 2024 •

edited

Loading