Llama3-8b-Instruct won't stop generating #442

ekim322 · 2024-04-27T23:56:28Z

System Info

lorax-client==0.5.0

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

.

Expected behavior

I use below code to get LLM response.

pb = Predibase(api_token=os.environ.get("PREDIBASE_API_TOKEN"))
lorax_client = pb.deployments.client("llama-3-8b-instruct")
lorax_client.generate(
    prompt,
    adapter_id="ekim322/cpAdapter",
    adapter_source="hub",
    api_token=os.environ.get("HF_W_TOKEN"),
    max_new_tokens=512,
).generated_text

Llama3 keeps generating tokens until max_new_tokens. It looks like the eos_token_id is never registered.
I had similar issue running locally, and updating transformers to >4.40 solved the issue. Issue related to llama3b and llama3b-instruct having different eos_tokens.

I tried setting stop_sequence

lorax_client.generate(
    prompt,
    adapter_id="ekim322/cpAdapter",
    adapter_source="hub",
    api_token=os.environ.get("HF_W_TOKEN"),
    max_new_tokens=512,
    stop_sequences=['<|end_of_text|>', '<|eot_id|>']
).generated_text

but this returns empty string response. What is the proper way to set stopping tokens?

Am I setting up Predibase correctly?

The text was updated successfully, but these errors were encountered:

tgaddair · 2024-05-23T19:47:41Z

Hey @ekim322, we recently made some changes to fix this in #456. Can you try with the latest LoRAX version to see if the error persists?

micholeodon · 2024-06-24T14:32:10Z

@ekim322 does the change mentioned by @tgaddair fixes your problem?
If yes, tell how, please.
If no, but you have found some other solution, please share it :)

I have exactly the same issue with LoRAX and Llama3-8B-Instruct - model tries to use up the max_new_tokens limit.

ekim322 · 2024-06-24T16:16:24Z

@micholeodon Updating the transformers library and training the model solved the issue for me (I am not 100% sure if this was the fix or #456, but the inference is working fine for me now - I think my issue got solved on my end before #456 was implemented).

I think there were some errors with Llama 3 Instruct chat template in the older version of Transformers library if I remember correctly.

I'd recommend updating all the libraries and training the model again.

micholeodon · 2024-06-25T07:17:34Z

Thank very much for your comment.

Speaking of training, I use the same version of transformers to (1) ask model "manually" (via transformers.pipeline) and to (2) ask model via LoRAX.

(1) works like a charm
(2) is greedy and use up all tokens

If method (1) works then I don't expect updating library or training model again could help.
What do you think?

micholeodon · 2024-06-25T07:39:22Z

I have just solved the problem. I have used proper chat template for Llama3-8B-Instruct. Essentially, make sure that the string you pass to LoRAX inputs parameter of the /generate endpoint has proper tokens like <|begin_of_text|>, <|eot_id|>, <|start_header_id|>, <|end_header_id|>, <|end_of_text|>.

See:

ekim322 · 2024-06-25T18:05:00Z

My model was working fine with transformers, but not with lorax (same issue as @micholeodon).

When I last checked, Llama3 and llama3_instruct used different tokens (e.g. <|end_of_text|> for base model eos token and <|eot_id|> for instruct model eos)
I am not sure if the older transformer library versions handled this properly in the adapter/generation configs. Inference was still working fine using transformers, just not with Lorax.

I updated transformers library and retrained the model - inference worked fine with Lorax right away, I didn't have to make any adjustments.

tgaddair added the bug Something isn't working label May 23, 2024

tgaddair self-assigned this May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama3-8b-Instruct won't stop generating #442

Llama3-8b-Instruct won't stop generating #442

ekim322 commented Apr 27, 2024

tgaddair commented May 23, 2024

micholeodon commented Jun 24, 2024 •

edited

Loading

ekim322 commented Jun 24, 2024 •

edited

Loading

micholeodon commented Jun 25, 2024

micholeodon commented Jun 25, 2024 •

edited

Loading

ekim322 commented Jun 25, 2024

Llama3-8b-Instruct won't stop generating #442

Llama3-8b-Instruct won't stop generating #442

Comments

ekim322 commented Apr 27, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

tgaddair commented May 23, 2024

micholeodon commented Jun 24, 2024 • edited Loading

ekim322 commented Jun 24, 2024 • edited Loading

micholeodon commented Jun 25, 2024

micholeodon commented Jun 25, 2024 • edited Loading

ekim322 commented Jun 25, 2024

micholeodon commented Jun 24, 2024 •

edited

Loading

ekim322 commented Jun 24, 2024 •

edited

Loading

micholeodon commented Jun 25, 2024 •

edited

Loading