-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to pass additional options? Eg., num_ctx for Ollama #330
Comments
gptel uses Ollama's In any case, Only a subset of options common to most LLM APIs is exposed by gptel right now. I plan to eventually cover backend-specific parameters, such as |
Thanks for your response. I'm looking to increase the default context length. Without passing the I would like to increase context length to 4096 or higher, as there are fine tuned llama3 models that are capable of 8x the usual context length. This is a powerful use case to summarize long texts in org, markdown or any text file if one could set I'm not sure how Is using Thanks! |
I understand.
The
As stated above, there is currently no way to set this using gptel. Since you're running Ollama somewhere that you control (I'm assuming), you can set this parameter from the Ollama CLI instead. I'm not sure exactly how to do this either, this looks promising. It also seems like you should be able to type
|
Got it! Thanks again. I do control the Ollama server as it is running locally. I believe I should be able to create my own "modelfile" with the required parameters and use it as a custom model.
|
@krvpal have you had any success? I'm also wondering how to increase the context length. |
Until I can add official support for num_ctx to gptel, I can address this problem by picking a high (but not too high) default value sent with all requests to Ollama. What token count do you suggest? 2048? 4096? Higher?
|
I've added support for |
I've had to revert this change since there's no way to control the response token limit without |
Hi @Frozenlock , I used 'modelfile' to configure
Hi @karthink, current llama3 model supports up to 8192 tokens. I'd recommend to set this max value itself as the default for Ollama, and if for any reason one needs to reduce the context tokens, they can always use the |
* gptel-ollama.el (gptel--request-data): Set Ollama's default context to 8192, which is what Llama 3 supports (#330). This is currently not customizable, but is intended to be in the future.
Done, thank you for the suggestion. I will close this issue now. I have a TODO item to make this customizable in the future, after gptel gets the ability to handle per-backend and per-model capabilities. |
Hi, thank you for this excellent package!
I'm using Ollama and to be able to set the context length, ollama provides this option:
I'm struggling to use this in Gptel. I understand there is a
curl-args
keyword, but any variation like the below I'm trying does not seem to be working. I've setgptel-log-level
todebug
and checking the logs, but the option does not show up.Please help!
The text was updated successfully, but these errors were encountered: