Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Make Adaptative Batching algorithm customizable #4753

Open
Hubert-Bonisseur opened this issue May 23, 2024 · 2 comments
Open

feature: Make Adaptative Batching algorithm customizable #4753

Hubert-Bonisseur opened this issue May 23, 2024 · 2 comments

Comments

@Hubert-Bonisseur
Copy link

Feature request

It would be very nice to be able to implement one's own batching logic.

Motivation

AFAIK, the adaptative batching algorithm functions as a blackbox. The parameters (max batch size and max latency) offer very limited control over it.
For instance, I have a use case where the inputs may greatly vary in length and batching them together does not make sense. In some cases it is even slower than doing it sequentially.

I would love to be able to only batch together inputs that are close in length by writing my own logic.

Other

No response

@frostming
Copy link
Contributor

Hi, if you are still interested in this, we have made some changes to the batching recently.

Now the batch might be split into smaller pieces to fit the max batch size.

For example, if the max_batch_size is 10 and we send 3 requests with size of [7, 7, 6] in receiving order, the batch engine will execute with size of 7+3 and 4+6 sequentially, the second request is split in to two parts with size of (3, 4).

@Hubert-Bonisseur
Copy link
Author

@frostming
Thanks for keeping me updated :)
If I understand correctly your comment applies to when the client sends data in batches. In that case, the batches are now rebuilt to better fit the max batch size argument.

This is not the feature I wish existed. I would really like it if we had more options to customize the way batches are made, like for instance only batching together inputs that share a same metadata field. This is key to getting the best performance in some cases, like the example in my original post.

Maybe it is already possible to do something by subclassing the class that handles batching ? It is not mentionned in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants