-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Limit IOps for VMs #576
Comments
Related ticket: https://github.com/neondatabase/cloud/issues/5647 |
We by the way seem to have |
from one of NeonVMs
Disks in VMs are VirtIO-blk devices, so we can try |
Blocked on design work. Moving back from "in progress" to "selected". |
Problem description / Motivation
This hasn't happened yet for VMs, but in theory a noisy tenant can saturate disk IO by itself, leading to significant degradation on the underlying k8s node (affecting other pods and kubelet itself).
Recent inspiration: https://neondb.slack.com/archives/C061XEGSCE7/p1697733194985739?thread_ts=1697732054.624899&cid=C061XEGSCE7
This is also potentially affected by recently moving the file cache to disk.
Feature idea(s) / DoD
IO rate limiting for VMs should not be an accidental side-effect of the speed of QEMU; i.e. we should have intentional safeguards to cap the amount of IO a single VM can do.
This could be implemented as a global constant, compiled in, or we could make it part of the VM spec (with some default value) — maybe could combine with settings from #547.
Implementation ideas
We're already running QEMU in a cgroup — we can additionally limit e.g.
io.max
based on CPU.We should also consider how this looks from within the VM: if QEMU is blocked on disk, does the VM kernel observe that as the underlying device being slow, or does the VM get invisibly paused? Does time spent waiting on disk count towards the QEMU cgroup's cpu.max? (if so, do we need to change that?)
The text was updated successfully, but these errors were encountered: