Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

更大的模型需要更多的block吗? #18

Open
PoseidomWong opened this issue Mar 13, 2024 · 1 comment
Open

更大的模型需要更多的block吗? #18

PoseidomWong opened this issue Mar 13, 2024 · 1 comment

Comments

@PoseidomWong
Copy link

如果想把llama-pro应用到更大的模型中比如34B、72B,那么是否需要按比例增大block的数量?这方面的实验是否有做过呢?

@hills-code
Copy link
Collaborator

hills-code commented Mar 13, 2024

我们也在探索更大的模型,不过这样的实验很需要资源,目前为止我们探索了在不同架构,如mistral上的扩展,取得了一定的效果,如Mistral-Pro,后续我们也会进一步探索这方面的idea。我们发现yi也最近使用深度扩展进行了数学代码的训练,Yi-9B,他扩展了16层,我相信复制的位置,复制的层数,还是有很多值得研究的地方,我们会逐步研究的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants