-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama3-8b-Instruct won't stop generating #442
Comments
@micholeodon Updating the transformers library and training the model solved the issue for me (I am not 100% sure if this was the fix or #456, but the inference is working fine for me now - I think my issue got solved on my end before #456 was implemented). I think there were some errors with Llama 3 Instruct chat template in the older version of Transformers library if I remember correctly. I'd recommend updating all the libraries and training the model again. |
Thank very much for your comment. Speaking of training, I use the same version of transformers to (1) ask model "manually" (via transformers.pipeline) and to (2) ask model via LoRAX. (1) works like a charm If method (1) works then I don't expect updating library or training model again could help. |
I have just solved the problem. I have used proper chat template for Llama3-8B-Instruct. Essentially, make sure that the string you pass to LoRAX See: |
My model was working fine with transformers, but not with lorax (same issue as @micholeodon). When I last checked, Llama3 and llama3_instruct used different tokens (e.g. I updated transformers library and retrained the model - inference worked fine with Lorax right away, I didn't have to make any adjustments. |
System Info
lorax-client==0.5.0
Information
Tasks
Reproduction
.
Expected behavior
I use below code to get LLM response.
Llama3 keeps generating tokens until max_new_tokens. It looks like the eos_token_id is never registered.
I had similar issue running locally, and updating transformers to >4.40 solved the issue. Issue related to
llama3b
andllama3b-instruct
having differenteos_tokens
.I tried setting
stop_sequence
but this returns empty string response. What is the proper way to set stopping tokens?
Am I setting up Predibase correctly?
The text was updated successfully, but these errors were encountered: