Skip to content

Token based dynamic batching behaviour #151

Closed Answered by OlivierDehaene
joaopcm1996 asked this question in Q&A
Discussion options

You must be logged in to vote

Token based dyanmic batching means that we use the number of tokens per request to control how many requests will be added to a batch.
Classic implementation of dynamic batching only consider requests as a whole and batch together requests until a maximum value of requests inside the batch is achieved. This can lead to an under utilization of the hardware.

Let's say you have a queue with 16 requests of different sizes in tokens:

  • Classic dynamic batching with a maximum batch size of 4 will always take the 4 first requests even though they could be very small (1 token each for example)

  • Token based dynamic batching will continue adding requests to the batch until a maximum number of toke…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by OlivierDehaene
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants