-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k3s is unable to start after stopped for a while #10253
Comments
You didn't mention that you're running longhorn on these nodes, but that appears to be the case? The apiserver isn't coming ready, probably because it can't talk to longhorn to complete aggregated api discovery. Pods appear to be crashlooping because the LH CSI isn't registered:
Maybe check the pod logs on the server node to see what's wrong with the LH CSI deployment? You need to get LH fixed on the server node so that it can finish api discovery and become ready for agents to join. |
@brandond Yes, I have longhorn installed, but the Also, it seems pods in |
Have you tried force-deleting the pods on the agents so that they get rescheduled to the server node? I think that the LH pods will need to come up before other things will work properly. You might create an issue in the LH repo to ask if there are any specific cold-start procedures that you need to observe. |
Environmental Info:
K3s Version: v1.28.10+k3s1
Node(s) CPU architecture, OS, and Version: Linux rpi-3 5.15.0-1055-raspi #58-Ubuntu SMP PREEMPT Sat May 4 03:52:40 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
Cluster Configuration: 1 server, 2 agents
Describe the bug:
I stopped the all server and agents for a few days, and it cannot restart anymore.
Steps To Reproduce:
Expected behavior:
Cluster starts successfully
Actual behavior:
Some error like
Error syncing pod, skipping
is recursing, agents cannot join the server, andserver is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error
Additional context / logs:
Gist: https://gist.github.com/harryzcy/34f7bb0a54defffda64377f17b863609
The text was updated successfully, but these errors were encountered: