How to pass additional options? Eg., num_ctx for Ollama #330

krvpal · 2024-06-18T21:49:46Z

Hi, thank you for this excellent package!

I'm using Ollama and to be able to set the context length, ollama provides this option:

"options": {
    "num_ctx": 4096
  }

I'm struggling to use this in Gptel. I understand there is a curl-args keyword, but any variation like the below I'm trying does not seem to be working. I've set gptel-log-level to debug and checking the logs, but the option does not show up.

  (defvar gptel--ollama
    (gptel-make-ollama
        "ollama"                             
      :host "localhost:12345"             
      :protocol "http"
      :models '("llama3")           
      :stream t
      :curl-args '("num_ctx:4096")
      ))

Please help!

The text was updated successfully, but these errors were encountered:

karthink · 2024-06-18T22:00:20Z

gptel uses Ollama's /api/chat endpoint, not /api/generate. I'm not sure if num_ctx is meaningful for Chat-style requests.

In any case, num_ctx is not explicitly supported by gptel. If you want to limit the context size, you can run the relevant /set parameter ... command in Ollama itself, or use the -n option in gptel's transient menu to limit the request to the last n responses:

Only a subset of options common to most LLM APIs is exposed by gptel right now. I plan to eventually cover backend-specific parameters, such as num_ctx, but this won't be any time soon.

krvpal · 2024-06-18T22:29:12Z

Thanks for your response.

I'm looking to increase the default context length. Without passing the num_ctx to Ollama, the context length is always set to 2048. This is a problem when I'm using Gptel to summarize text, which is longer and the response I receive only considers the last paragraph.

I would like to increase context length to 4096 or higher, as there are fine tuned llama3 models that are capable of 8x the usual context length.

This is a powerful use case to summarize long texts in org, markdown or any text file if one could set num_ctx. I'm using the -n parameter already as suggested.

I'm not sure how /set parameter ... would work, because I typically start the Ollama server and Gptel handles the rest. I do not use ollama run which is the command that accepts this option. I will explore configuring Ollama server itself in the meantime.

Is using curl-args out of the question?

Thanks!

karthink · 2024-06-18T22:46:03Z

I'm looking to increase the default context length. Without passing the num_ctx to Ollama, the context length is always set to 2048. This is a problem when I'm using Gptel to summarize text, which is longer and the response I receive only considers the last paragraph.

I would like to increase context length to 4096 or higher, as there are fine tuned llama3 models that are capable of 8x the usual context length.

I understand.

This is a powerful use case to summarize long texts in org, markdown or any text file if one could set num_ctx. I'm using the -n parameter already as suggested.

The -n parameter will help limit the context size further, but not increase the default. So you're right about it being not useful for your use case.

I'm not sure how /set parameter ... would work, because I typically start the Ollama server and Gptel handles the rest. I do not use ollama run which is the command that accepts this option. I will explore configuring Ollama server itself in the meantime.

As stated above, there is currently no way to set this using gptel. Since you're running Ollama somewhere that you control (I'm assuming), you can set this parameter from the Ollama CLI instead. I'm not sure exactly how to do this either, this looks promising. It also seems like you should be able to type /set parameter ... into the CLI when using ollama run.

Is using curl-args out of the question?

Thanks!

:curl-args is for supplying command line arguments to Curl, such as for additional authentication, web proxy etc. It does not affect the contents of the JSON packet to be sent to Ollama.

krvpal · 2024-06-18T22:53:33Z

Got it! Thanks again.

I do control the Ollama server as it is running locally. I believe I should be able to create my own "modelfile" with the required parameters and use it as a custom model.

ollama run creates a fresh instance every time, I do not think it will help in this case.

Frozenlock · 2024-06-21T23:24:34Z

@krvpal have you had any success? I'm also wondering how to increase the context length.

karthink · 2024-06-22T00:19:58Z

Until I can add official support for num_ctx to gptel, I can address this problem by picking a high (but not too high) default value sent with all requests to Ollama. What token count do you suggest? 2048? 4096? Higher?

karthink · 2024-06-23T02:49:32Z

I've added support for num_ctx via gptel-max-tokens. You can set this using the -c option in gptel's menu, or just set the variable directly. Please update, test and let me know.

karthink · 2024-06-23T18:42:34Z

I've added support for num_ctx via gptel-max-tokens. You can set this using the -c option in gptel's menu, or just set the variable directly. Please update, test and let me know.

I've had to revert this change since there's no way to control the response token limit without gptel-max-tokens. For now we'll have to settle for adding a high enough default num_ctx parameter to requests.

krvpal · 2024-06-26T22:40:56Z

@krvpal have you had any success? I'm also wondering how to increase the context length.

Hi @Frozenlock , I used 'modelfile' to configure num_ctx and run that model with a different name in Ollama. Here's a nice article you may find useful, and the official reference.

I've added support for num_ctx via gptel-max-tokens. You can set this using the -c option in gptel's menu, or just set the variable directly. Please update, test and let me know.

I've had to revert this change since there's no way to control the response token limit without gptel-max-tokens. For now we'll have to settle for adding a high enough default num_ctx parameter to requests.

Hi @karthink, current llama3 model supports up to 8192 tokens. I'd recommend to set this max value itself as the default for Ollama, and if for any reason one needs to reduce the context tokens, they can always use the -n option in gptel's menu.

* gptel-ollama.el (gptel--request-data): Set Ollama's default context to 8192, which is what Llama 3 supports (#330). This is currently not customizable, but is intended to be in the future.

karthink · 2024-06-26T22:49:49Z

Hi @karthink, current llama3 model supports up to 8192 tokens. I'd recommend to set this max value itself as the default for Ollama, and if for any reason one needs to reduce the context tokens, they can always use the -n option in gptel's menu.

Done, thank you for the suggestion.

I will close this issue now. I have a TODO item to make this customizable in the future, after gptel gets the ability to handle per-backend and per-model capabilities.

karthink closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to pass additional options? Eg., num_ctx for Ollama #330

How to pass additional options? Eg., num_ctx for Ollama #330

krvpal commented Jun 18, 2024

karthink commented Jun 18, 2024

krvpal commented Jun 18, 2024 •

edited

Loading

karthink commented Jun 18, 2024

krvpal commented Jun 18, 2024

Frozenlock commented Jun 21, 2024

karthink commented Jun 22, 2024 via email

karthink commented Jun 23, 2024

karthink commented Jun 23, 2024

krvpal commented Jun 26, 2024

karthink commented Jun 26, 2024

How to pass additional options? Eg., num_ctx for Ollama #330

How to pass additional options? Eg., num_ctx for Ollama #330

Comments

krvpal commented Jun 18, 2024

karthink commented Jun 18, 2024

krvpal commented Jun 18, 2024 • edited Loading

karthink commented Jun 18, 2024

krvpal commented Jun 18, 2024

Frozenlock commented Jun 21, 2024

karthink commented Jun 22, 2024 via email

karthink commented Jun 23, 2024

karthink commented Jun 23, 2024

krvpal commented Jun 26, 2024

karthink commented Jun 26, 2024

krvpal commented Jun 18, 2024 •

edited

Loading