[Feature]: Dynamic tpm quota (multiple projects) #4124

krrishdholakia · 2024-06-11T16:04:26Z

The Feature

Allow setting a dynamic_quote: true flag for a dynamic tpm budget dependent on how many simultaneous apps there are

Motivation, pitch

prevent apps from exhausting tpm quota

Twitter / LinkedIn details

cc: @jeromeroussin

The text was updated successfully, but these errors were encountered:

krrishdholakia · 2024-06-11T16:06:02Z

for example you'd have to define how long an app is considered "active" after their last call, so you can accurately compute a number of "simultaneous" apps

krrishdholakia · 2024-06-21T23:34:38Z

can we do this similar to latency ttl?

check the number of active apps -> divide the available tpm capacity by number of active apps -> return that as the available tpm for that call

krrishdholakia · 2024-06-21T23:35:06Z

this probably wouldn't be in router.py

as it doesn't really have anything to do with the load balancing between models

krrishdholakia · 2024-06-21T23:36:21Z

would probably be a call hook on the proxy

similar to parallel_request_limiter.py -

litellm/litellm/proxy/hooks/parallel_request_limiter.py

Line 4 in c0540e7

from litellm.proxy._types import UserAPIKeyAuth

krrishdholakia · 2024-06-22T00:15:03Z

Basic test cases:

If 1 'active' project => give all tpm
If 2 'active' projects => divide tpm in 2

krrishdholakia · 2024-06-22T03:30:47Z

For v0 considering tracking 'active apps' by model with ttl = 60s

krrishdholakia · 2024-06-22T03:32:40Z

E2E case - "I am more concerned about apps gobbling up all the quota"

base case:
1 project -> gets all quota

advanced case:
2 projects -> each gets equal quota

krrishdholakia added the enhancement New feature or request label Jun 11, 2024

krrishdholakia mentioned this issue Jun 22, 2024

feat(dynamic_rate_limiter.py): Dynamic tpm quota (multiple projects) #4349

Merged

krrishdholakia closed this as completed in #4349 Jun 23, 2024

krrishdholakia closed this as completed in 2545da7 Jun 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Dynamic tpm quota (multiple projects) #4124

[Feature]: Dynamic tpm quota (multiple projects) #4124

krrishdholakia commented Jun 11, 2024 •

edited

Loading

krrishdholakia commented Jun 11, 2024

krrishdholakia commented Jun 21, 2024

krrishdholakia commented Jun 21, 2024

krrishdholakia commented Jun 21, 2024

krrishdholakia commented Jun 22, 2024

krrishdholakia commented Jun 22, 2024

krrishdholakia commented Jun 22, 2024

[Feature]: Dynamic tpm quota (multiple projects) #4124

[Feature]: Dynamic tpm quota (multiple projects) #4124

Comments

krrishdholakia commented Jun 11, 2024 • edited Loading

The Feature

Motivation, pitch

Twitter / LinkedIn details

krrishdholakia commented Jun 11, 2024

krrishdholakia commented Jun 21, 2024

krrishdholakia commented Jun 21, 2024

krrishdholakia commented Jun 21, 2024

krrishdholakia commented Jun 22, 2024

krrishdholakia commented Jun 22, 2024

krrishdholakia commented Jun 22, 2024

krrishdholakia commented Jun 11, 2024 •

edited

Loading