inotify support for kubernetes_logs #20541

mrzor · 2024-05-21T09:03:24Z

A note for the community

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

Some pods may succeed (enter the Succeeded state) very quickly. The lifecycle of the log folder is presently tied to the pod lifecycle itself, and the duration during which it is tracked is not straightforward to control (it is actually unclear to what extent that can be controlled at all).

Attempted Solutions

Adding artificial delays to workloads allow Vector to pick up the logs even with longer glob cooldowns. We found that 1.1-2.1x cooldown to restore perfect delivery. That is the best solution, but requires workload-author involvement.
Setting glob_minimum_cooldown_ms from 60s to 5s polling time lowers the amount of impacted workloads at the cost of vastly increased CPU usage. There must be side-effects to the node itself which has to service the extra filesystem syscalls as well.
A mix of the two above (extra latency to be safe and lowered cooldown) allows one to walk the tradeoff space.
Tried to find a way to have Kubernetes keep the log around longer. Did not find a sure way. (Source). Failed containers seem to stay longer.

Proposal

I can't do much outside of expressing my strong support for inotify support for the kubernetes source. That may or may not be implemented as part of the file source. It could be an entirely new source on which to base the kubernetes source.

In some of the referenced issues, an argument for platform-independence was made. While the file source certainly has to be platform independent, why should the kubernetes one be? I claim no expertise on the matters of Kubernetes on Windows, but a quick look at the Kubernetes | Windows User Guide | Capturing logs from workloads indicates that the prefered way is not to use files at all. Because of that, it seems to me that the kubernetes source does not support Windows adequately today - and as such, the platform independence argument is moot. Outside of Linux, Kubernetes only runs on Windows as far as I'm aware (k8s/BSD is hardly a thing).

One should not have to choose between efficiency and deliverability, which is the tradeoff we have to make here. Let's have both with inotify !

References

K8s log source sets default_glob_minimum_cooldown_ms too high, does not document its existence #6771 is a trace of a past change
Is the glob_minimum_cooldown_ms default of 5 seconds correct? #7840 is a long discussion of the tradeoffs for different values of the polling timer
Unexpected high latency in File source #19585 generalizes the concern to every other File source
Vector randomly stops shipping certain k8s logs #12014 has a few mentions of high polling delays resulting in dropped logs (arguably, "not picked up" logs would be more accurate)

Version

vector 0.36.0 (x86_64-unknown-linux-gnu)

The text was updated successfully, but these errors were encountered:

jszwedko · 2024-05-21T13:34:39Z

Thanks for these thoughts @mrzor ! We had discussed this a bit before and I think we'd be open to seeing inotify used in the kubernetes_logs source and also the file source so long as there is also a fallback mechanism (scanning).

mrzor added the type: feature A value-adding code addition that introduce new functionality. label May 21, 2024

jszwedko added the source: kubernetes_logs Anything `kubernetes_logs` source related label May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inotify support for kubernetes_logs #20541

inotify support for kubernetes_logs #20541

mrzor commented May 21, 2024 •

edited

Loading

jszwedko commented May 21, 2024

inotify support for kubernetes_logs #20541

inotify support for kubernetes_logs #20541

Comments

mrzor commented May 21, 2024 • edited Loading

A note for the community

Use Cases

Attempted Solutions

Proposal

References

Version

jszwedko commented May 21, 2024

mrzor commented May 21, 2024 •

edited

Loading