Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Length-based pricing for Google models #4229

Closed
areibman opened this issue Jun 16, 2024 · 2 comments · Fixed by #4243
Closed

[Feature]: Length-based pricing for Google models #4229

areibman opened this issue Jun 16, 2024 · 2 comments · Fixed by #4243
Labels
enhancement New feature or request

Comments

@areibman
Copy link

The Feature

Some models price tokens differently based on the length of the prompt. It would be helpful to potentially restructure or add fields to the model price dictionary to account for this.

This could look something like:

{
    "gemini-1.5-flash-latest": {
        "max_tokens": 8192,
        "max_input_tokens": 1000000,
        "max_input_tokens_short": 128000,
        "max_output_tokens": 8192,
        "input_cost_per_token_short": 3.5e-07,
        "input_cost_per_token_long": 7e-07,
        "output_cost_per_token_short": 1.0500000000000001e-06,
        "output_cost_per_token_long": 2.1000000000000002e-06,
        "litellm_provider": "vertex_ai-language-models",
        "mode": "chat",
        "supports_function_calling": true,
        "supports_vision": true,
        "source": "https://ai.google.dev/pricing"
    }
}

Motivation, pitch

Google models price tokens differently for prompts >128k tokens. According to https://ai.google.dev/pricing: image

This was brought up in AgentOps-AI/tokencost#53 which relies on the LiteLLM cost tracker

Twitter / LinkedIn details

https://www.twitter.com/alexreibman

@krrishdholakia
Copy link
Contributor

we can be more specific here @areibman - since it's not standard yet what 'long' and 'short' mean.

This seems similar to how tgai pricing works based on token params - what if we do input_cost_per_token_up_to_128k and input_cost_per_token_above_128k ?

@areibman
Copy link
Author

we can be more specific here @areibman - since it's not standard yet what 'long' and 'short' mean.

This seems similar to how tgai pricing works based on token params - what if we do input_cost_per_token_up_to_128k and input_cost_per_token_above_128k ?

That would probably work! Only precaution I can think of is if some providers start providing multiple tier pricing per model I.e. <128k, 128k - 256k, 256k - 512k etc.

This should be as easy as updating the proxies json, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants