Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use super-admin.conf for kube-vip on first CP #11242

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sathieu
Copy link
Contributor

@sathieu sathieu commented May 28, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #11229

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. labels May 28, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 28, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @sathieu. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 28, 2024

kube_vip_enabled: true
kube_vip_controlplane_enabled: true
kube_vip_address: 10.20.30.40 # FIXME
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be changed. What is the subnet of the VMs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm IIRC it's based on the sample inventory if you could have a template that pick a reasonable address index within kube_pods_subnet or kube_service_addresses that would be nice I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AH wait nvm the subnet of the vms hmmm in that case I am not sure, afaik kubevirt decides not sure if you can get an IP at that layer like that :/

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label May 28, 2024
@andrewfung729
Copy link

Hi @sathieu , I tried your PR and I get the following error. I think the reason is that "Kube-vip | Write static pod" is run before "Set fact first_kube_control_plane".

ansible.errors.AnsibleUndefinedVariable: 'first_kube_control_plane' is undefined. 'first_kube_control_plane' is undefined                                                            
fatal: [cp-1]: FAILED! => {                                                                                                                                                      
    "changed": false,                                                                                                                                                                
    "msg": "AnsibleUndefinedVariable: 'first_kube_control_plane' is undefined. 'first_kube_control_plane' is undefined"                                                              
}  

@yankay
Copy link
Member

yankay commented May 29, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 29, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sathieu
Once this PR has been reviewed and has the lgtm label, please assign floryut for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sathieu sathieu force-pushed the kube-vip-super-admin branch 2 times, most recently from 1eae15a to 0d83237 Compare May 30, 2024 19:49
@pmichali
Copy link

pmichali commented Jun 1, 2024

FYI, I tried the patch and hit the same issue of 'first_kube_control_plane' not defined. I did this "hack" and was able to get this to work in a bare-metal, off-prem configuration:

diff --git a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
index f7b04a624..b5acdac8c 100644
--- a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
+++ b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
@@ -6,6 +6,10 @@
     - kube_proxy_mode == 'ipvs' and not kube_proxy_strict_arp
     - kube_vip_arp_enabled

+- name: Kube-vip | Check if first control plane
+  set_fact:
+    is_first_control_plane: "{{ inventory_hostname == groups['kube_control_plane'] | first }}"
+
 - name: Kube-vip | Write static pod
   template:
     src: manifests/kube-vip.manifest.j2
diff --git a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2 b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
index 11a971e93..7b59bca4c 100644
--- a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
+++ b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
@@ -119,6 +119,6 @@ spec:
   hostNetwork: true
   volumes:
   - hostPath:
-      path: /etc/kubernetes/admin.conf
+      path: /etc/kubernetes/{% if is_first_control_plane %}super-{% endif %}admin.conf
     name: kubeconfig
 status: {}

@sathieu
Copy link
Contributor Author

sathieu commented Jun 3, 2024

FYI, I tried the patch and hit the same issue of 'first_kube_control_plane' not defined. I did this "hack" and was able to get this to work in a bare-metal, off-prem configuration:

You are not testing the last version, see 0d83237.

@sathieu sathieu force-pushed the kube-vip-super-admin branch 2 times, most recently from e8dfb4b to 8662f58 Compare June 3, 2024 12:33
@pmichali
Copy link

pmichali commented Jun 3, 2024

FYI, I tried the patch and hit the same issue of 'first_kube_control_plane' not defined. I did this "hack" and was able to get this to work in a bare-metal, off-prem configuration:

You are not testing the last version, see 0d83237.

0d83237 is based off of 4b82e90. My diff is based off of the newer 351393e commit on master branch. As mentioned by @andrewfung729, first_kube_control_plane is not defined, when kube-vip.manifest.j2 is processed. My change defines the fact, right before kube-vip.manifest.j2 is processed. I just used another name - is_first_control_plane.

@sathieu
Copy link
Contributor Author

sathieu commented Jun 3, 2024

CI is failing for probably unrelated reason, can someone take a look?

I think fixing this issue is easy, but adding proper test is harder (because we don't have an external IP available).

@sathieu
Copy link
Contributor Author

sathieu commented Jun 6, 2024

Now failing with:

No such file or directory - qemu-img

(unrelated to this PR)

@opethema
Copy link
Contributor

opethema commented Jun 20, 2024

I tried the fix on 2 cp cluster with kubespray v2.25.0 and it's failing for the second master:

TASK [kubernetes/control-plane : Wait for k8s apiserver] *****************************************************************************************
ok: [first_cp]
fatal: [second_cp]: FAILED! => {"changed": false, "elapsed": 181, "msg": "Timeout when waiting for lb-apiserver.kubernetes.local:6443"}

Am I missing anything?

EDIT: sorry, was using the wrong interface..

It worked fine and the playbook finished successfully..

@pmichali
Copy link

@sathieu FYI, I tried the latest patch on a 3 control plane/4 worker node on-prem k8s 1.29.5 cluster using Kubespray (2 week old pull of master branch, post 2.25.0) and it worked fine! Thanks for working on this.

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 27, 2024
@sathieu
Copy link
Contributor Author

sathieu commented Jun 27, 2024

I've removed the CI part as it doesn't work (we need a valid VIP).

Please review and merge 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initial setup of a k8s cluster with kubespray breaks if kube-vip is enabled
7 participants