Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to pass additional options? Eg., num_ctx for Ollama #330

Closed
krvpal opened this issue Jun 18, 2024 · 10 comments
Closed

How to pass additional options? Eg., num_ctx for Ollama #330

krvpal opened this issue Jun 18, 2024 · 10 comments

Comments

@krvpal
Copy link

krvpal commented Jun 18, 2024

Hi, thank you for this excellent package!

I'm using Ollama and to be able to set the context length, ollama provides this option:

"options": {
    "num_ctx": 4096
  }

I'm struggling to use this in Gptel. I understand there is a curl-args keyword, but any variation like the below I'm trying does not seem to be working. I've set gptel-log-level to debug and checking the logs, but the option does not show up.

  (defvar gptel--ollama
    (gptel-make-ollama
        "ollama"                             
      :host "localhost:12345"             
      :protocol "http"
      :models '("llama3")           
      :stream t
      :curl-args '("num_ctx:4096")
      ))                             

Please help!

@karthink
Copy link
Owner

gptel uses Ollama's /api/chat endpoint, not /api/generate. I'm not sure if num_ctx is meaningful for Chat-style requests.

In any case, num_ctx is not explicitly supported by gptel. If you want to limit the context size, you can run the relevant /set parameter ... command in Ollama itself, or use the -n option in gptel's transient menu to limit the request to the last n responses:

Screenshot_20240618_145812


Only a subset of options common to most LLM APIs is exposed by gptel right now. I plan to eventually cover backend-specific parameters, such as num_ctx, but this won't be any time soon.

@krvpal
Copy link
Author

krvpal commented Jun 18, 2024

Thanks for your response.

I'm looking to increase the default context length. Without passing the num_ctx to Ollama, the context length is always set to 2048. This is a problem when I'm using Gptel to summarize text, which is longer and the response I receive only considers the last paragraph.

I would like to increase context length to 4096 or higher, as there are fine tuned llama3 models that are capable of 8x the usual context length.

This is a powerful use case to summarize long texts in org, markdown or any text file if one could set num_ctx. I'm using the -n parameter already as suggested.

I'm not sure how /set parameter ... would work, because I typically start the Ollama server and Gptel handles the rest. I do not use ollama run which is the command that accepts this option. I will explore configuring Ollama server itself in the meantime.

Is using curl-args out of the question?

Thanks!

@karthink
Copy link
Owner

I'm looking to increase the default context length. Without passing the num_ctx to Ollama, the context length is always set to 2048. This is a problem when I'm using Gptel to summarize text, which is longer and the response I receive only considers the last paragraph.

I would like to increase context length to 4096 or higher, as there are fine tuned llama3 models that are capable of 8x the usual context length.

I understand.

This is a powerful use case to summarize long texts in org, markdown or any text file if one could set num_ctx. I'm using the -n parameter already as suggested.

The -n parameter will help limit the context size further, but not increase the default. So you're right about it being not useful for your use case.

I'm not sure how /set parameter ... would work, because I typically start the Ollama server and Gptel handles the rest. I do not use ollama run which is the command that accepts this option. I will explore configuring Ollama server itself in the meantime.

As stated above, there is currently no way to set this using gptel. Since you're running Ollama somewhere that you control (I'm assuming), you can set this parameter from the Ollama CLI instead. I'm not sure exactly how to do this either, this looks promising. It also seems like you should be able to type /set parameter ... into the CLI when using ollama run.

Is using curl-args out of the question?

Thanks!

:curl-args is for supplying command line arguments to Curl, such as for additional authentication, web proxy etc. It does not affect the contents of the JSON packet to be sent to Ollama.

@krvpal
Copy link
Author

krvpal commented Jun 18, 2024

Got it! Thanks again.

I do control the Ollama server as it is running locally. I believe I should be able to create my own "modelfile" with the required parameters and use it as a custom model.

ollama run creates a fresh instance every time, I do not think it will help in this case.

@Frozenlock
Copy link

@krvpal have you had any success? I'm also wondering how to increase the context length.

@karthink
Copy link
Owner

karthink commented Jun 22, 2024 via email

@karthink
Copy link
Owner

I've added support for num_ctx via gptel-max-tokens. You can set this using the -c option in gptel's menu, or just set the variable directly. Please update, test and let me know.

@karthink
Copy link
Owner

I've added support for num_ctx via gptel-max-tokens. You can set this using the -c option in gptel's menu, or just set the variable directly. Please update, test and let me know.

I've had to revert this change since there's no way to control the response token limit without gptel-max-tokens. For now we'll have to settle for adding a high enough default num_ctx parameter to requests.

@krvpal
Copy link
Author

krvpal commented Jun 26, 2024

@krvpal have you had any success? I'm also wondering how to increase the context length.

Hi @Frozenlock , I used 'modelfile' to configure num_ctx and run that model with a different name in Ollama. Here's a nice article you may find useful, and the official reference.

I've added support for num_ctx via gptel-max-tokens. You can set this using the -c option in gptel's menu, or just set the variable directly. Please update, test and let me know.

I've had to revert this change since there's no way to control the response token limit without gptel-max-tokens. For now we'll have to settle for adding a high enough default num_ctx parameter to requests.

Hi @karthink, current llama3 model supports up to 8192 tokens. I'd recommend to set this max value itself as the default for Ollama, and if for any reason one needs to reduce the context tokens, they can always use the -n option in gptel's menu.

karthink added a commit that referenced this issue Jun 26, 2024
* gptel-ollama.el (gptel--request-data): Set Ollama's default
context to 8192, which is what Llama 3 supports (#330).  This is
currently not customizable, but is intended to be in the future.
@karthink
Copy link
Owner

Hi @karthink, current llama3 model supports up to 8192 tokens. I'd recommend to set this max value itself as the default for Ollama, and if for any reason one needs to reduce the context tokens, they can always use the -n option in gptel's menu.

Done, thank you for the suggestion.

I will close this issue now. I have a TODO item to make this customizable in the future, after gptel gets the ability to handle per-backend and per-model capabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants