Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Dynamic tpm quota (multiple projects) #4124

Closed
krrishdholakia opened this issue Jun 11, 2024 · 7 comments · Fixed by #4349
Closed

[Feature]: Dynamic tpm quota (multiple projects) #4124

krrishdholakia opened this issue Jun 11, 2024 · 7 comments · Fixed by #4349
Labels
enhancement New feature or request

Comments

@krrishdholakia
Copy link
Contributor

krrishdholakia commented Jun 11, 2024

The Feature

Allow setting a dynamic_quote: true flag for a dynamic tpm budget dependent on how many simultaneous apps there are

Motivation, pitch

prevent apps from exhausting tpm quota

Twitter / LinkedIn details

cc: @jeromeroussin

@krrishdholakia krrishdholakia added the enhancement New feature or request label Jun 11, 2024
@krrishdholakia
Copy link
Contributor Author

for example you'd have to define how long an app is considered "active" after their last call, so you can accurately compute a number of "simultaneous" apps

@krrishdholakia
Copy link
Contributor Author

can we do this similar to latency ttl?

check the number of active apps -> divide the available tpm capacity by number of active apps -> return that as the available tpm for that call

@krrishdholakia
Copy link
Contributor Author

this probably wouldn't be in router.py

as it doesn't really have anything to do with the load balancing between models

@krrishdholakia
Copy link
Contributor Author

would probably be a call hook on the proxy

similar to parallel_request_limiter.py -

from litellm.proxy._types import UserAPIKeyAuth

@krrishdholakia
Copy link
Contributor Author

Basic test cases:

  • If 1 'active' project => give all tpm
  • If 2 'active' projects => divide tpm in 2

@krrishdholakia
Copy link
Contributor Author

For v0 considering tracking 'active apps' by model with ttl = 60s

@krrishdholakia
Copy link
Contributor Author

E2E case - "I am more concerned about apps gobbling up all the quota"

base case:
1 project -> gets all quota

advanced case:
2 projects -> each gets equal quota

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant