Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability? #25

Open
hzgdeerHo opened this issue May 17, 2024 · 8 comments

Comments

@hzgdeerHo
Copy link

After finetuned the llama-3-8B-instruct with the same configuration ,as the code from:https://github.com/hiyouga/LLaMA-Factory/tree/3df986c6793a51ec2cb5f31fd1808cd3a9883bc4/examples/extrasexamples/extras/llama_pro always leads to apparent loss of original ability? I only used the train datasets "Identity". Can you help? THANKS

@hzgdeerHo
Copy link
Author

The final training loss is about 0.1-0.05 ,and I think it is might not be caused by overfitting ?

@hills-code
Copy link
Collaborator

Hi! Have you tried to directly finetune llama-3-8B-instruct? What will happen in this setting?
I did not carry out the experiments with llama-3 so maybe I am not very familiar with the feature of it. I think you can also try to change the position of the added blocks. Recent Yi-tech report and some llama3-120B models show that maybe fix the first few layers are important. Hope this will be helpful!

@hzgdeerHo
Copy link
Author

OK,thanks! Could you show me some link as reference to figure out the problem?

@hills-code
Copy link
Collaborator

Certainly! Here is the link to Yi-9B https://huggingface.co/01-ai/Yi-9B and its tech report https://arxiv.org/pdf/2403.04652
You can find the depth upscaling in the Sec 7.3
image
and LLaMa3-120B https://huggingface.co/alpindale/goliath-120b

@hzgdeerHo
Copy link
Author

Thanks !

@hzgdeerHo
Copy link
Author

hzgdeerHo commented May 19, 2024

I have post this new issue :hiyouga/LLaMA-Factory#3811 . Would you please help to explain ? Thanks!

@hiyouga
Copy link

hiyouga commented May 19, 2024

Using small datasets and large epochs in training can easily lead to overfitting.

@hzgdeerHo
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants