Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to upgrade a flatcar cluster due to patch "Refactor bootstrap-os (#10983)" #11268

Open
oboudry-mvp opened this issue Jun 5, 2024 · 2 comments · May be fixed by #11270
Open

Failure to upgrade a flatcar cluster due to patch "Refactor bootstrap-os (#10983)" #11268

oboudry-mvp opened this issue Jun 5, 2024 · 2 comments · May be fixed by #11270
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@oboudry-mvp
Copy link

What happened?

When upgrading a 5 node flatcar cluster using Kubespray v2.25.0 the following task fails (on all nodes, only copied the first error message).

TASK [bootstrap-os : Make interpreter discovery works on Flatcar] **********************************************
fatal: [flatcar01]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'ansible_interpreter_python_fallback' is undefined. 'ansible_interpreter_python_fallback' is undefined\n\nThe error appears to be in '/tmp/kubespray/roles/bootstrap-os/tasks/flatcar.yml': line 24, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Make interpreter discovery works on Flatcar\n  ^ here\n"}

The cluster is currently in kubernetes v1.28.6 version deployed by Kubespray v2.24.1.

I suspect the patch which is causing my issues is Refactor bootstrap-os (#10983) especially the changes to the roles/bootstrap-os/tasks/flatcar.yml file. But I'm not sure how to workaround it. It looks like it comes from an ansible change to make ansible work out of the box with Flatcar, but in my case it's not working.

What did you expect to happen?

Pre-tests to pass and smooth upgrade. I've followed this same repetitive process for versions from 2.27.x to 2.28.x

How can we reproduce it (as minimally and precisely as possible)?

cd /tmp
git clone --single-branch --depth=1 --branch v2.25.0 [email protected]:kubernetes-sigs/kubespray.git
cd kubespray/inventory/
git clone --depth 1 [email protected]:marvinpac-it/mvp-cluster.git
cd ..

# sudo apt install python3.10-venv

python3 -m venv venv
source venv/bin/activate
pip install -U -r requirements.txt

ansible-playbook -i inventory/mvp-cluster/hosts.yaml -u core --key-file ~/.ssh/flatcar_ssh.pem  -b -e kube_version=v1.29.5 upgrade-cluster.yml

The marvinpac-it/mvp-cluster repository contains my inventory. Beyond activating nginx, the only change I made for deploying the cluster on flatcar was to set the bin_dir variable to /opt/bin. This is as I understand a pre-requisite for flatcar knowing that the /usr folder is read-only.

OS

Ansible server:
Linux 5.15.0-107-generic x86_64
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Kubernetes nodes:
Linux 6.1.90-flatcar x86_64
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=3815.2.3
VERSION_ID=3815.2.3
BUILD_ID=2024-05-21-1124
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 3815.2.3 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:3815.2.3:::::::*"

Version of Ansible

ansible [core 2.16.7]
config file = /tmp/kkk/kubespray/ansible.cfg
configured module search path = ['/tmp/kkk/kubespray/library']
ansible python module location = /tmp/kkk/kubespray/venv/lib/python3.10/site-packages/ansible
ansible collection location = /home/olivier/.ansible/collections:/usr/share/ansible/collections
executable location = /tmp/kkk/kubespray/venv/bin/ansible
python version = 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (/tmp/kkk/kubespray/venv/bin/python3)
jinja version = 3.1.4
libyaml = True

Version of Python

Python 3.10.12

Version of Kubespray (commit)

7e0a407

Network plugin used

calico

Full inventory with variables

Can't make this command work, but inventory is fully described here: https://github.com/marvinpac-it/mvp-cluster

Command used to invoke ansible

ansible-playbook -i inventory/mvp-cluster/hosts.yaml -u core --key-file ~/.ssh/flatcar_ssh.pem -b -e kube_version=v1.29.5 upgrade-cluster.yml

Output of ansible run

https://gist.github.com/oboudry-mvp/1b1271d0880f1cd80f07610e22463f4e

Anything else we need to know

I tried to add the following environment variable that I found in a post:

-e '{"ansible_interpreter_python_fallback":["/opt/bin/pypy3/bin/python"]}'

It goes a bit further, but at some point it fails too. Output below:

https://gist.github.com/oboudry-mvp/8f345c6fcba1c178883ec6f98fb192ce

@oboudry-mvp oboudry-mvp added the kind/bug Categorizes issue or PR as related to a bug. label Jun 5, 2024
oboudry-mvp added a commit to marvinpac-it/kubespray that referenced this issue Jun 5, 2024
@oboudry-mvp
Copy link
Author

oboudry-mvp commented Jun 5, 2024

I think I found the reason for the error. In file roles/bootstrap-os/tasks/flatcar.yml a new value of [ '/opt/bin/python' ] is appended to the ansible_interpreter_python_fallback list. But there is no default value in case this list is not set. I created a PR for setting a default value of [] to this parameter.

There is still a second bug after this one, but I don't think they're linked so I'll open a separate issue.

@oboudry-mvp
Copy link
Author

A workaround for this issue (waiting on PR to be merged) is to add the following parameter to the ansible command: -e '{"ansible_interpreter_python_fallback":[]}'

The second bug I faced, and mentioned in the thread above, has already been solved as PR #11224

If I add the environment variable, and manually perform the PR #11224 on release v2.25.0, the upgrade proceeds without problem on my flatcar cluster.

When PR #11270 is merged, flatcar setups should upgrade without problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
1 participant