-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set GOMAXPROCS
and GOMEMLIMIT
environment variables
#6977
base: master
Are you sure you want to change the base?
Conversation
…r resources Signed-off-by: Jesper Noordsij <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @jnoordsij. Thanks for your PR. I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@wallrj FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, appreciate you getting involved with the project!
I don't think we can merge this as is, and I think to merge something like this we'd need to discuss further about how to do this safely and in a way that won't lead to confusing outcomes. I'd encourage you to attend one of our meetings if you want to discuss this!
{{- if (.Values.resources.limits).cpu }} | ||
- name: GOMAXPROCS | ||
valueFrom: | ||
resourceFieldRef: | ||
resource: limits.cpu | ||
{{- end }} | ||
{{- if (.Values.resources.limits).memory }} | ||
- name: GOMEMLIMIT | ||
valueFrom: | ||
resourceFieldRef: | ||
resource: limits.memory | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: A valid value for limits.cpu
would be 100m
. That value wouldn't be valid for GOMAXPROCS as far as I can tell, and I think it'd be ignored.
Similarly, a valid amount of memory in limits.memory
would be 1e6
or 120M
, and both of those would be invalid entries for GOMEMLIMIT (this says that supported suffixes are "B, KiB, MiB, GiB, and TiB").
I don't think we can generally apply the values of resource limits like this. This won't have the expected effect for a lot of totally valid resource limits.
It's maybe possible to construct some Helm function to convert resource limits to GOMAXPROCS / GOMEMLIMIT, but I think that'd be hard to do and a pain to debug. My instinct is that it's probably not worth the effort to add this - what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might actually work: https://billglover.me/2022/09/14/use-the-kubernetes-downwards-api-to-set-gomemlimit/
That article seems to suggest that the downwards API actually returns the computed int value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this PR needs confirmation, tests and documentation of that 😁 That article might be worth adding to the PR description but an article isn't really enough to get this change over the line.
Specifically, I'd like to see examples of what GOMAXPROCS is set to if my CPU limits are 0.01
, 1m
or 2.5
as initial test cases. I'd also like to see what GOMEMLIMIT
is set to if I specify 1e6
, or 1KB
as memory limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resourceFieldRef
is a very specific Kubernetes directive that is created specifically for passing resource-related values, which rounds up the CPU value to the nearest whole number (e.g. 250m to 1) and passes the memory as a numeric value; so 64Mi
would result in the environment variable being set to 67108864
. This by design makes it completely compatible with Go's API.
An example is documented within Kubernetes documentation itself: https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/#use-container-fields-as-values-for-environment-variables.
Would referencing to Kubernetes documentation suffice here, given that this just basically ensures the correct behavior by design? And if so, what would be a suitable please to add this reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the link!
Honestly, having the docs linked in this PR is probably enough. I don't think we need to complicate the helm chart with a link, and anyone that cares will be able to git blame
and find this PR easily enough.
I'll enable testing for this PR 👌
/ok-to-test |
@jnoordsij next step is to verify that this PR actually improves performance of cert-manager. @wallrj might be able to help here, he did some benchmarking of cert-manager in the past. |
@jnoordsij thanks for bringing this to our attention. I've been testing the effect of GOMEMLIMIT by setting the environment variables directly. My conclusion from the experiments below, is it that setting the GOMEMLIMIT equal to the memory limit has the following advantages:
Memory Limit 200MiB / GOMEMLIMIT 200MiBIn this experiment I set both a memory limit of 200Mi and GOMEMLIMIT of 200MiB.
#values.yaml
resources:
requests:
cpu: 1
memory: 200Mi
limits:
memory: 200Mi
extraEnv:
- name: GOMEMLIMIT
value: 200MiB
prometheus:
enabled: true
servicemonitor:
enabled: true
config:
apiVersion: controller.config.cert-manager.io/v1alpha1
kind: ControllerConfiguration
kubernetesAPIQPS: 10000
kubernetesAPIBurst: 10000
maxConcurrentChallenges: 400
numberOfConcurrentWorkers: 8
featureGates:
ServerSideApply: true Memory Limit 200MiBIn this experiment I only set a memory limit, not GOMEMLIMIT.
richard@LAPTOP-HJEQ9V9G:~$ kubectl get pods -n venafi --watch
NAME READY STATUS RESTARTS AGE
cert-manager-69f6f7585f-vnplg 1/1 Running 0 4m29s
cert-manager-approver-policy-75664b78fc-jsnp8 1/1 Running 0 3m49s
cert-manager-cainjector-56846796ff-j8rs7 1/1 Running 0 4m29s
cert-manager-webhook-6979b54d5f-brgmg 1/1 Running 0 4m29s
tlspk-monitoring-kube-state-metrics-68945cb7fd-l5pwb 1/1 Running 0 34s
trust-manager-9445c9f58-d6p2x 1/1 Running 0 3m26s
venafi-enhanced-issuer-6687f4dcd5-26cxn 1/1 Running 0 3m49s
cert-manager-69f6f7585f-vnplg 0/1 OOMKilled 0 9m51s
cert-manager-69f6f7585f-vnplg 1/1 Running 1 (2s ago) 9m52s
cert-manager-69f6f7585f-vnplg 0/1 OOMKilled 1 (4s ago) 9m54s
cert-manager-69f6f7585f-vnplg 0/1 CrashLoopBackOff 1 (7s ago) 10m
cert-manager-69f6f7585f-vnplg 1/1 Running 2 (19s ago) 10m
cert-manager-69f6f7585f-vnplg 0/1 OOMKilled 2 (21s ago) 10m
cert-manager-69f6f7585f-vnplg 0/1 CrashLoopBackOff 2 (7s ago) 10m
cert-manager-69f6f7585f-vnplg 1/1 Running 3 (33s ago) 10m
cert-manager-69f6f7585f-vnplg 0/1 OOMKilled 3 (35s ago) 10m
cert-manager-69f6f7585f-vnplg 0/1 CrashLoopBackOff 3 (3s ago) 10m
cert-manager-69f6f7585f-vnplg 1/1 Running 4 (45s ago) 11m
cert-manager-69f6f7585f-vnplg 0/1 OOMKilled 4 (46s ago) 11m
cert-manager-69f6f7585f-vnplg 0/1 CrashLoopBackOff 4 (7s ago) 11m
cert-manager-69f6f7585f-vnplg 1/1 Running 5 (92s ago) 13m
cert-manager-69f6f7585f-vnplg 0/1 OOMKilled 5 (93s ago) 13m
cert-manager-69f6f7585f-vnplg 0/1 CrashLoopBackOff 5 (5s ago) 13m
cert-manager-69f6f7585f-vnplg 1/1 Running 6 (2m42s ago) 15m
cert-manager-69f6f7585f-vnplg 0/1 OOMKilled 6 (2m44s ago) 15m
cert-manager-69f6f7585f-vnplg 0/1 CrashLoopBackOff 6 (1s ago) 15m
cert-manager-69f6f7585f-vnplg 1/1 Running 7 (5m10s ago) 20m
cert-manager-69f6f7585f-vnplg 0/1 OOMKilled 7 (5m12s ago) 21m
cert-manager-69f6f7585f-vnplg 0/1 CrashLoopBackOff 7 (10s ago) 21m #values.yaml
resources:
requests:
cpu: 1
memory: 200Mi
limits:
memory: 200Mi
prometheus:
enabled: true
servicemonitor:
enabled: true
config:
apiVersion: controller.config.cert-manager.io/v1alpha1
kind: ControllerConfiguration
kubernetesAPIQPS: 10000
kubernetesAPIBurst: 10000
maxConcurrentChallenges: 400
numberOfConcurrentWorkers: 8
featureGates:
ServerSideApply: true Without memory limitIn this base line experiment neither memory limit nor GOMEMLIMIT were set.
#values.yaml
resources:
requests:
cpu: 1
memory: 200Mi
prometheus:
enabled: true
servicemonitor:
enabled: true
config:
apiVersion: controller.config.cert-manager.io/v1alpha1
kind: ControllerConfiguration
kubernetesAPIQPS: 10000
kubernetesAPIBurst: 10000
maxConcurrentChallenges: 400
numberOfConcurrentWorkers: 8
featureGates:
ServerSideApply: true
|
@wallrj thanks a lot for your efforts to benchmark the changes, looking forward to your further findings! Regarding the exceeding of the memory limit: I've observed similar measurements on my applications in the past, but have not yet been able to find a thorough explanation. My thoughts so far are probably some kind of mismeasurement and/or reporting caused by a restart of the container, causing the reported value to show the sum of both the just-killed container and the new one, although this is mostly speculative on my part. |
/hold |
@jnoordsij wrote:
I've updated the comment with the remaining findings. I have some doubts about setting GOMEMLIMIT equal to the
I might start by trying to document the advantages of setting GOMEMLIMIT in https://cert-manager.io/docs/devops-tips/scaling-cert-manager/#set-appropriate-memory-requests-and-limits
I found a couple of possibly related issues: |
Pull Request Motivation
Set
GOMAXPROCS
andGOMEMLIMIT
environment variables based on container resources.Inspired by traefik/traefik-helm-chart#1029, this should reduce potential CPU throttling and OOMKills on containers.
Kind
/kind feature
Release Note