🎅 I WISH LITELLM HAD... #361

krrishdholakia · 2023-09-13T19:40:55Z

This is a ticket to track a wishlist of items you wish LiteLLM had.

COMMENT BELOW 👇

With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs

Respond with ❤️ to any request you would also like to see

P.S.: Come say hi 👋 on the Discord

krrishdholakia · 2023-09-13T19:44:04Z

[LiteLLM Client] Add new models via UI

Thinking aloud it seems intuitive that you'd be able to add new models / remap completion calls to different models via UI. Unsure on real problem though.

krrishdholakia · 2023-09-13T19:46:30Z

User / API Access Management

Different users have access to different models. It'd be helpful if there was a way to maybe leverage the BudgetManager to gate access. E.g. GPT-4 is expensive, i don't want to expose that to my free users but i do want my paid users to be able to use it.

krrishdholakia · 2023-09-13T19:48:57Z

cc: @yujonglee @WilliamEspegren @zakhar-kogan @ishaan-jaff @PhucTranThanh feel free to add any requests / ideas here.

ishaan-jaff · 2023-09-13T19:49:49Z

[Spend Dashboard] View analytics for spend per llm and per user

This allows me to see what my most expensive llms are and what users are using litellm heavily

ishaan-jaff · 2023-09-13T19:51:34Z

Auto select the best LLM for a given task

If it's a simple task like responding to "hello" litlellm should auto-select a cheaper but faster llm like j2-light

Pipboyguy · 2023-09-13T21:43:33Z

Integration with NLP Cloud

krrishdholakia · 2023-09-13T22:04:01Z

That's awesome @Pipboyguy - dm'ing on linkedin to learn more!

krrishdholakia · 2023-09-14T17:56:09Z

@ishaan-jaff check out this truncate param in the cohere api

This looks super interesting. Similar to your token trimmer. If the prompt exceeds context window, trim in a particular manner.

I would maybe only run trimming on user/assistant messages. Not touch the system prompt (works for RAG scenarios as well).

haseeb-heaven · 2023-09-17T00:00:25Z

Option to use Inference API so we can use any model from Hugging Face 🤗

krrishdholakia · 2023-09-17T00:20:03Z

@haseeb-heaven you can already do this -

litellm/litellm/llms/huggingface_restapi.py

Line 53 in a63784d

completion_url = f"https://api-inference.huggingface.co/models/{model}"

from litellm import completion 
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response)

haseeb-heaven · 2023-09-17T00:30:12Z

@haseeb-heaven you can already do this -

litellm/litellm/llms/huggingface_restapi.py

Line 53 in a63784d

completion_url = f"https://api-inference.huggingface.co/models/{model}"
from litellm import completion 
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response) 

Wow great thanks its working. Nice feature

smig23 · 2023-09-18T02:39:52Z

Support for inferencing using models hosted on Petals swarms (https://github.com/bigscience-workshop/petals), both public and private.

ishaan-jaff · 2023-09-18T16:11:27Z

@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests

shauryr · 2023-09-18T17:28:54Z

finetuning wrapper for openai, huggingface etc.

krrishdholakia · 2023-09-18T18:37:02Z

@shauryr i created an issue to track this - feel free to add any missing details here

smig23 · 2023-09-18T18:57:48Z

@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests

Specifically for my aims, I'm running a private swarm as a experiment with a view to implementing with in private organization, who have idle GPU resources, but it's distributed. The initial target would be inferencing and if litellm was able to be the abstraction layer, it would allow flexibility to go another direction with hosting in the future.

ranjancse26 · 2023-09-19T05:02:17Z

I wish the litellm to have a direct support for finetuning the model. Based on the below blog post, I understand that in order to fine tune, one needs to have a specific understanding on the LLM provider and then follow their instructions or library for fine tuning the model. Why not the LiteLLM do all the abstraction and handle the fine-tuning aspects as well?

https://docs.litellm.ai/docs/tutorials/finetuned_chat_gpt
https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset

ranjancse26 · 2023-09-19T07:31:45Z

I wish LiteLLM has a support for open-source embeddings like sentence-transformers, hkunlp/instructor-large etc.

Sorry, based on the below documentation, it seems there's only support for the Open AI embedding.

https://docs.litellm.ai/docs/embedding/supported_embedding

ranjancse26 · 2023-09-19T09:21:00Z

I wish LiteLLM has the integration to cerebrium platform. Please check the below link for the prebuilt-models.

https://docs.cerebrium.ai/cerebrium/prebuilt-models

ishaan-jaff · 2023-09-19T16:19:28Z

@ranjancse26 what models on cerebrium do you want to use with LiteLLM ?

ranjancse26 · 2023-09-19T16:30:20Z

@ishaan-jaff The cerebrium has got a lot of pre-built model. The focus should be on consuming the open-source models first ex: Lama 2, GPT4All, Falcon, FlanT5 etc. I am mentioning this as a first step. However, it's a good idea to have the Litellm take care of the internal communication with the custom-built models too. In-turn based on the API which the cerebrium is exposing.

ishaan-jaff · 2023-09-19T18:44:22Z

@smig23 We've added support for petals to LiteLLM https://docs.litellm.ai/docs/providers/petals

ranjancse26 · 2023-09-21T00:25:23Z

I wish Litellm has a built-in support for the majority of the provider operations than targeting the text generation alone. Consider an example of Cohere, the below one allows users to have conversations with a Large Language Model (LLM) from Cohere.

https://docs.cohere.com/reference/post_chat

ranjancse26 · 2023-09-21T00:32:02Z

I wish Litellm has a ton of support and examples for users to develop apps with RAG pattern. It's kind of mandatory to go with the standard best practices and we all wish to have the same support.

ranjancse26 · 2023-09-21T00:36:39Z

I wish Litellm has use-case driven examples for beginners. Keeping in mind of the day-to-day use-cases, it's a good idea to come up with a great sample which covers the following aspects.

Text classification
Text summarization
Text translation
Text generation
Code generation

ranjancse26 · 2023-09-21T00:39:56Z

I wish Litellm to support for various known or popular vector db's. Here are couple of them to begin with.

Pinecone
Qdrant
Weaviate
Milvus
DuckDB
Sqlite

ranjancse26 · 2023-09-21T00:49:23Z

I wish Litellm has a built-in support for performing the web-scrapping or to get the real-time data using known provider like serpapi. It will be helpful for users to build the custom AI models or integrate with the LLMs for performing the retrieval augmented based generation.

https://serpapi.com/blog/llms-vs-serpapi/#serpapi-google-local-results-parser
https://colab.research.google.com/drive/1Q9VvVzjZJja7_y2Ls8qBkE_NApbLiqly?usp=sharing

zhaoninge · 2024-05-03T01:38:01Z

I wish litellm had an API to check available models from providers in real time.

QwertyJack · 2024-05-06T00:19:34Z

I wish LiteLLM had support for Sambaverse.
https://docs.sambanova.ai/sambaverse/latest/index.html

Thanks

horahoradev · 2024-05-06T22:55:23Z

Discord alerting would be nice

ggallotti · 2024-05-13T20:39:05Z

Wilcard for model_name property in model_list:

model_list:
  - model_name: "vertex_ai/*"
    litellm_params:
      model: "vertex_ai/*"
      vertex_project: os.environ/VERTEXAI_PROJECT
  - model_name: "anthropic/*"
    litellm_params:
      model: "anthropic/*"
      api_key: os.environ/ANTHROPIC_API_KEY      
  - model_name: "gemini/*"
    litellm_params:
      model: "gemini/*"
      api_key: os.environ/GEMINI_API_KEY

krrishdholakia · 2024-05-13T20:45:17Z

@ggallotti would that be similar to how we do it for openai today -

https://docs.litellm.ai/docs/providers/openai#2-start-the-proxy

ggallotti · 2024-05-13T20:48:11Z

@ggallotti would that be similar to how we do it for openai today -

https://docs.litellm.ai/docs/providers/openai#2-start-the-proxy

Thanks for the response.
But that configuration does not works, as will force the OpenAI apikey for other models.

ducnvu · 2024-05-17T14:43:51Z

Streamlined way to call vision and non-vision models would be great. Being LLM-agnostic is a big reason why I use the package but currently still have to handle different request format depending on which model it goes to.

For example: Calling GPT4 Vision, messages.content is an array. Using the same code to call Azure's Command R+ would result in

litellm.exceptions.APIError: OpenAIException - Error code: 400 - {'message': 'invalid type: parameter messages.content is of type array but
 should be of type string.'}

I'm aware this is on the model provider's side, but GPT's non-vision models for example support both format.

krrishdholakia · 2024-05-17T14:46:13Z

@ducnvu seems like something we need to fix - can you share the command r call?

ducnvu · 2024-05-17T15:01:19Z

@krrishdholakia Thanks for the prompt response, the call is something like this. I don't have access to all models supported by litellm to test but so far OpenAI models work with both string messages.content and the format below, Command R is where I first encounter this error. All my calls are through Azure.

dict = {'temperature': 0.7, 'n': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'messages': [{'role': 'system', 'content': [{'type': 'text', 'text': "You are Command R Plus, answer as concisely as possible (e.g. don't be verbose). When writing code, specify the language as per the markdown format."}]}, {'role': 'user', 'content': [{'type': 'text', 'text': 'hi'}]}], 'timeout': 600, 'stream': True, 'model': 'azure/command-r-plus', 'api_base': BASE, 'api_key': KEY}

await litellm.acompletion(**dict())

guiramos · 2024-06-04T18:43:42Z

Hi guys, I am trying to use open interpreter with gemini 1.5 flash and getting this error:

raise APIConnectionError(
litellm.exceptions.APIConnectionError: gemini does not support parameters: {'functions': [{'name': 'execute', 'description': "Executes code on the user's machine in the users local environment and returns the output", 'parameters': {'type': 'object', 'properties': {'language': {'type': 'string', 'description': 'The programming language (required parameter to the execute function)', 'enum': ['ruby', 'python', 'shell', 'javascript', 'html', 'applescript', 'r', 'powershell', 'react']}, 'code': {'type': 'string', 'description': 'The code to execute (required)'}}, 'required': ['language', 'code']}}]}, for model=gemini-1.5-flash-latest. To drop these, set litellm.drop_params=True or for proxy:

by default, open interpreter use functions and it seems to fail.

Does google gemini 1.5 via litellm supports functions? Which version?

If does not support, I wish litellm had this implemented...

guiramos · 2024-06-05T03:10:32Z

Ok, functions or tools is defintely not working.

I am following this tutorial and works greatly calling the gemini api directly:
https://ai.google.dev/gemini-api/docs/function-calling/tutorial?lang=python

However, passing the same set of commands to litellm, gives this error:

litellm.exceptions.APIConnectionError: gemini does not support parameters: {'tools': [<function multiply at 0x14c684680>]}, for model=gemini-1.5-flash-latest. To drop these, set `litellm.drop_params=True` or for proxy:
`litellm_settings:
 drop_params: true`

I think part of the problem is in the utils.py:6570 check where the supported_params are being returned:

elif custom_llm_provider == "palm" or custom_llm_provider == "gemini":
       return ["temperature", "top_p", "stream", "n", "stop", "max_tokens"]

gemini supports way more than that. I am making a call like this:

return litellm.completion(
       messages=messages,
       temperature=0.0,
       model=target_model,
       tools=tools,
       safety_settings=[
           {
               "category": "HARM_CATEGORY_HARASSMENT",
               "threshold": "BLOCK_NONE",
           },
           {
               "category": "HARM_CATEGORY_HATE_SPEECH",
               "threshold": "BLOCK_NONE",
           },
           {
               "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
               "threshold": "BLOCK_NONE",
           },
           {
               "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
               "threshold": "BLOCK_NONE",
           },
       ]
   )
   ``` 
   
   
  But the `tools` argument is triggering the exception. 


Can we get this addressed one of these days, please?


@krrishdholakia

krrishdholakia · 2024-06-05T04:34:51Z

@guiramos got it - found the issue, we have it implemented for vertex ai, not google ai studio (which i think is what you're calling).

Can you try running this with

return litellm.completion(
       messages=messages,
       temperature=0.0,
       model="vertex_ai/gemini-1.5-pro",
       tools=tools,
       safety_settings=[
           {
               "category": "HARM_CATEGORY_HARASSMENT",
               "threshold": "BLOCK_NONE",
           },
           {
               "category": "HARM_CATEGORY_HATE_SPEECH",
               "threshold": "BLOCK_NONE",
           },
           {
               "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
               "threshold": "BLOCK_NONE",
           },
           {
               "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
               "threshold": "BLOCK_NONE",
           },
       ]
   )

and let me know if that works? - https://docs.litellm.ai/docs/providers/vertex

Also tracking the issue for gemini google ai studio - #3086

guiramos · 2024-06-05T19:01:12Z

@krrishdholakia I could not test with vertex as I don't have a api key for that.

Also, I tried for google studio and did work! Using the new version 1.40.2.

Do you have an estimate day for this? Please help.

danielflaherty · 2024-06-09T18:37:56Z

@krrishdholakia I could not test with vertex as I don't have a api key for that.

Also, I tried for google studio and did work! Using the new version 1.40.2.

Do you have an estimate day for this? Please help.

+1. Would be great to gave an estimate for when 1.5 pro w/ tools is supported using AI studio.

krrishdholakia · 2024-06-11T16:09:27Z

hey @danielflaherty @guiramos this should be fixed by end of week

guiramos · 2024-06-11T23:15:11Z

@krrishdholakia really appreciate this! Thank you!

ishaan-jaff · 2024-06-12T20:29:27Z

Discord alerting would be nice

@horahoradev This is live now https://docs.litellm.ai/docs/proxy/alerting#advanced---using-discord-webhooks

@horahoradev any chance we can hop on a call sometime this week? I'd love to learn how we can improve litellm for you

My linkedin if you prefer DMs: https://www.linkedin.com/in/reffajnaahsi/
Sharing a link to my cal for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

ishaan-jaff · 2024-06-12T21:22:33Z

Hi @nbaav1 We support this using the SERVER_ROOT_PATH env variable
doc: https://docs.litellm.ai/docs/proxy/deploy#customization-of-the-server-root-path

@nbaav1 any chance we can hop on a call ? I'd love to learn how how we can improve litellm for you.

I reached out to you on Linkedin if DMs work.
Sharing a link to my cal for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

I wish LiteLLM Proxy server had a config setting for proxy_base_url. For example hosting the server at http://0.0.0.0:4000/<proxy_base_url> or http://0.0.0.0:4000/abc/xyz. Meaning that I could do something like: litellm --model gpt-3.5-turbo --proxy_base_url abc/xyz And then:
import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000/abc/xyz"
)

response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)
This would simplify our infrastructure in AWS and still comply with company policies. Thanks!

danielchalef · 2024-06-13T05:24:30Z

Support for Redis Clusters. LiteLLM currently only supports Redis Standalone nodes.

barakplasma · 2024-06-16T19:46:53Z

support vision on local images
basically by adding support for local file urls to

litellm/litellm/llms/prompt_templates/factory.py

Lines 624 to 635 in 3a35a58

    
           def convert_url_to_base64(url): 
        
               import requests 
        
               import base64 
        
               for _ in range(3): 
        
                   try: 
        
                       response = requests.get(url) 
        
                       break 
        
                   except: 
        
                       pass 
        
               if response.status_code == 200: 
        
                   image_bytes = response.content

andresd95 · 2024-06-17T09:20:02Z

Support for custom models imported in Bedrock.

Use case: we have a fine-tuned model deployed in Bedrock. The tuned model is based on OpenOrca, so the start and end tokens are different than instruct version.

If the provider is mistral, the template uses the instruct template rather than OpenOrca's.

  response = client.invoke_model(
      body={"prompt": "<s>[INST] hello, tell me a joke [/INST]\n", "max_tokens": 1024, "temperature": 0},
      modelId=<model_id>,
      accept=accept,
      contentType=contentType
  )

Tokens<|im_start|> and <|im_end|> should be used instead.

Tried using a custom provider as a workaround. However, the body is empty and the request fails:

  response = client.invoke_model(
      body={},
      modelId=<model_id>,
      accept=accept,
      contentType=contentType
  )

The only thing we need is that prompt template configuration is respected, as it is done with amazon or anthropic providers.

      model_id: "model_arn"
      roles: {"system":{"pre_message":"<|im_start|>system\n", "post_message":"<|im_end|>"}, "assistant":{"pre_message":"<|im_start|>assistant\n","post_message":"<|im_end|>"}, "user":{"pre_message":"<|im_start|>user\n","post_message":"<|im_end|>"}}
      bos_token: "<s>"
      eos_token: "<|im_end|>"

litellm/litellm/llms/bedrock.py

Lines 743 to 746 in 3a35a58

    
           elif provider == "mistral": 
        
               prompt = prompt_factory( 
        
                   model=model, messages=messages, custom_llm_provider="bedrock" 
        
               )

krrishdholakia · 2024-06-17T17:38:54Z

I wish it was possible to specify which callbacks LiteLLM would use on a per request basis (e.g. without modifying global state)

Hey @motin this is possible already

Proxy: https://docs.litellm.ai/docs/proxy/reliability#test---client-side-fallbacks

SDK: https://docs.litellm.ai/docs/completion/reliable_completions#fallbacks---switch-modelsapi-keysapi-bases

Taytay · 2024-06-19T04:27:42Z

First: We ❤️ LiteLLM
I wish it supported the new Gemini context caching:
https://ai.google.dev/gemini-api/docs/caching?lang=python

I admit I haven't thought the API through well, since this is a feature that only one providers offers at this point (but it likely won't be the last).

krrishdholakia · 2024-06-19T16:12:16Z

Hey @andresd95 bedrock custom prompts is fixed in latest release - can you confirm this works for you?

Hey @Taytay tracking #4284. DM'ed on LinkedIn to setup a support channel on this as well

andresd95 · 2024-06-20T07:17:40Z

Awesome @krrishdholakia! I tested and it works wonderfully.

Here is the configuration I used for OpenOrca, in case anyone else has the same use case:

  - model_name: bedrock/Mistral-7B-OpenOrca
    litellm_params:
      model: "bedrock/mistral.OpenOrca"
      model_id: "<your_model_arn>"
      roles: {"system":{"pre_message":"<|im_start|>system\n", "post_message":"<|im_end|>"}, "assistant":{"pre_message":"<|im_start|>assistant\n","post_message":"<|im_end|>"}, "user":{"pre_message":"<|im_start|>user\n","post_message":"<|im_end|>"}}
      bos_token: "<s>"
      eos_token: "<|im_end|>"

Thank you!

krrishdholakia pinned this issue Sep 13, 2023

krrishdholakia changed the title ~~LiteLLM Wishlist~~ 🎅 I WISH LITELLM ADDED... Sep 14, 2023

krrishdholakia changed the title ~~🎅 I WISH LITELLM ADDED...~~ 🎅 I WISH LITELLM HAD... Sep 14, 2023

zhaoninge mentioned this issue May 3, 2024

fix: corrected bedrock model list OpenDevin/OpenDevin#1513

Merged

krrishdholakia mentioned this issue Jun 17, 2024

Support for custom models imported in [Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html). #4239

Closed

krrishdholakia mentioned this issue Jun 19, 2024

Gemini API: Context Caching #4284

Open

ishaan-jaff mentioned this issue Jun 22, 2024

Support for Redis Clusters. LiteLLM currently only supports Redis Standalone nodes. #4358

Open

andresd95 mentioned this issue Jun 26, 2024

[Bug]: Custom template prompt for Bedrock #4415

Open

🎅 I WISH LITELLM HAD... #361

🎅 I WISH LITELLM HAD... #361

Comments

krrishdholakia commented Sep 13, 2023 • edited Loading

COMMENT BELOW 👇

With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs

krrishdholakia commented Sep 13, 2023

krrishdholakia commented Sep 13, 2023

krrishdholakia commented Sep 13, 2023 • edited Loading

ishaan-jaff commented Sep 13, 2023 • edited Loading

ishaan-jaff commented Sep 13, 2023

Pipboyguy commented Sep 13, 2023

krrishdholakia commented Sep 13, 2023

krrishdholakia commented Sep 14, 2023 • edited Loading

haseeb-heaven commented Sep 17, 2023

krrishdholakia commented Sep 17, 2023 • edited Loading

haseeb-heaven commented Sep 17, 2023

smig23 commented Sep 18, 2023

ishaan-jaff commented Sep 18, 2023

shauryr commented Sep 18, 2023

krrishdholakia commented Sep 18, 2023

smig23 commented Sep 18, 2023

ranjancse26 commented Sep 19, 2023

ranjancse26 commented Sep 19, 2023

ranjancse26 commented Sep 19, 2023

ishaan-jaff commented Sep 19, 2023

ranjancse26 commented Sep 19, 2023

ishaan-jaff commented Sep 19, 2023

ranjancse26 commented Sep 21, 2023

ranjancse26 commented Sep 21, 2023

ranjancse26 commented Sep 21, 2023

ranjancse26 commented Sep 21, 2023

ranjancse26 commented Sep 21, 2023 • edited Loading

zhaoninge commented May 3, 2024 • edited Loading

QwertyJack commented May 6, 2024

horahoradev commented May 6, 2024

ggallotti commented May 13, 2024

krrishdholakia commented May 13, 2024

ggallotti commented May 13, 2024

ducnvu commented May 17, 2024 • edited Loading

krrishdholakia commented May 17, 2024

ducnvu commented May 17, 2024

guiramos commented Jun 4, 2024 • edited Loading

guiramos commented Jun 5, 2024 • edited Loading

krrishdholakia commented Jun 5, 2024 • edited Loading

guiramos commented Jun 5, 2024

danielflaherty commented Jun 9, 2024

krrishdholakia commented Jun 11, 2024

guiramos commented Jun 11, 2024

ishaan-jaff commented Jun 12, 2024

ishaan-jaff commented Jun 12, 2024

danielchalef commented Jun 13, 2024

barakplasma commented Jun 16, 2024

andresd95 commented Jun 17, 2024

krrishdholakia commented Jun 17, 2024 • edited Loading

Taytay commented Jun 19, 2024

krrishdholakia commented Jun 19, 2024

andresd95 commented Jun 20, 2024

krrishdholakia commented Sep 13, 2023 •

edited

Loading

krrishdholakia commented Sep 13, 2023 •

edited

Loading

ishaan-jaff commented Sep 13, 2023 •

edited

Loading

krrishdholakia commented Sep 14, 2023 •

edited

Loading

krrishdholakia commented Sep 17, 2023 •

edited

Loading

ranjancse26 commented Sep 21, 2023 •

edited

Loading

zhaoninge commented May 3, 2024 •

edited

Loading

ducnvu commented May 17, 2024 •

edited

Loading

guiramos commented Jun 4, 2024 •

edited

Loading

guiramos commented Jun 5, 2024 •

edited

Loading

krrishdholakia commented Jun 5, 2024 •

edited

Loading

krrishdholakia commented Jun 17, 2024 •

edited

Loading