Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s fails to start because of empty token file #10128

Open
djcloudy opened this issue May 21, 2024 · 1 comment
Open

k3s fails to start because of empty token file #10128

djcloudy opened this issue May 21, 2024 · 1 comment

Comments

@djcloudy
Copy link

Environmental Info:
K3s Version:

k3s version v1.28.7+k3s1 (051b14b2)
go version go1.21.7

Node(s) CPU architecture, OS, and Version:

Linux stks01 5.15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:50:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy

Cluster Configuration:

3 node cluster (3 masters)

Describe the bug:

We have witnessed several times over different versions of k3s (1.24, 1.25, 1.26, 1.27, and now 1.28) where the token file will become 0 bytes.

-rw------- 1 root root 0 May 21 00:00 token

This typically happens after successfully upgrading all nodes in a cluster, but does not manifest itself until days or sometimes weeks later, potentially correlated with rebooting the nodes.

Steps To Reproduce:

  • Installed K3s
  • Upgrade cluster using system-upgrade-controller

Upgrade plan

apiVersion: upgrade.cattle.io/v1 kind: Plan metadata: labels: k3s-upgrade: server name: k3s-server namespace: system-upgrade spec: concurrency: 1 cordon: true nodeSelector: matchExpressions: - key: node-role.kubernetes.io/master operator: In values: - "true" serviceAccountName: system-upgrade upgrade: image: docker.artifactory.our-company.com/rancher/k3s-upgrade version: v1.28.7+k3s1

Expected behavior:

The token file should not be 0 bytes

Actual behavior:

Something is overwriting the token file, causing it to be 0 bytes

Additional context / logs:

This appears to be the same issue reported here: #5345
We can easily fix this by copying the contents from another surviving node in the cluster, however we would like to understand what mechanism is causing this so it can be prevented.

@brandond
Copy link
Contributor

brandond commented May 21, 2024

I'm not aware of any defects in K3s that will cause it to write a zero-byte token file. Is the timestamp always midnight (00:00), as that seems significant. It kinda sounds like something is causing unsynced changes to be lost from the filesystem. Do you have a cronjob or other process in place that upgrades and then reboots on a recurring schedule? Is it possible that this process is restarting K3s, and then rebooting the node while K3s is starting up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Triage
Development

No branches or pull requests

2 participants