Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otlp: Mimir's OTLP endpoint write request size limit should be configured in the same unit as the metrics producers #8260

Open
Tracked by #8223
replay opened this issue Jun 3, 2024 · 3 comments

Comments

@replay
Copy link
Contributor

replay commented Jun 3, 2024

The OTel collector and Alloy allow the user to configure the batch size limit in samples per batch via the setting send_batch_max_size, but since this limit is defined in samples per batch and not in bytes per batch it is difficult to find the right value which guarantees that the write requests don't exceed Mimir's limit which is defined in bytes per batch, especially because the OTel samples can be of variable size in bytes.

@ying-jeanne
Copy link
Contributor

ying-jeanne commented Jun 12, 2024

Talked with collector maintainer, the change has to be done on otlp collector, implement request size limit in batch sender. https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/batch_sender.go

@replay
Copy link
Contributor Author

replay commented Jun 12, 2024

Talked with collector maintainer, the change has to be done on otlp collector, implement request size limit in batch sender.

Oh, thx. Actually my idea was that we'd probably implement a limit in Mimir that matches the units of the one which the otlp collector uses, as opposed to the other way around. After all, I think we're trying to be compatible to otlp instead of making otlp compatible to us.

@ying-jeanne
Copy link
Contributor

Let me clarify the discussion from the weekly meeting:
To align the configuration between the Otel collector and Mimir. This alignment will help us provide better recommendations on user settings. The task could involve introducing a new limit based on the number of datapoints (option b). Since the size of datapoints can vary, we also need to maintain a limit based on bytes to protect against abusive requests. The byte limit should be set much higher and is expected to be triggered only when users send an excessive amount of data, such as histogram data.
a. We can implement a request size bytes limit on the Otel collector. This would be a lengthy process and require more discussions and benchmarks.
b. Alternatively, we can implement the same datapoint number limit as the Otel collector but set to a relatively lower threshold.

I can see that unify the limit could help on the case of incident, if we implement bytes size limit on otel collector, it for sure would prevent the case, we can recommend user set the byte limit. On the other hand, if we implement datapoint limit on Mimir, we can know the real datapoint client sent, then log or include it in the error message(today we have no clue but ask for client to check their settings). which would help us giving better recommendations as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants