Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packet_ubuntu20-calico-aio is failing on the task "packet-ci : Wait for vms to have ipaddress assigned" #8786

Closed
oomichi opened this issue May 4, 2022 · 8 comments · May be fixed by #11324
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.

Comments

@oomichi
Copy link
Contributor

oomichi commented May 4, 2022

Which jobs are failing:

packet_ubuntu20-calico-aio

Which test(s) are failing:

the task packet-ci : Wait for vms to have ipaddress assigned

Since when has it been failing:

5/4/2022

Testgrid link:

https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/2413816664

TASK [packet-ci : Wait for vms to have ipaddress assigned] *********************
task path: /builds/kargo-ci/kubernetes-sigs-kubespray/tests/cloud_playbooks/roles/packet-ci/tasks/create-vms.yml:29
[WARNING]: The value '' is not a valid IP address or network, passing this
value to ipaddr filter might result in breaking change in future.
FAILED - RETRYING: Wait for vms to have ipaddress assigned (20 retries left).
[..]
FAILED - RETRYING: Wait for vms to have ipaddress assigned (1 retries left).
failed: [localhost] (item=1) => {"ansible_index_var": "vm_id", "ansible_loop_var": "item", "attempts": 20, "changed": false, "cmd": "set -o pipefail && kubectl get vmis -n 531160325-2413816664 instance-0 -o json | jq '.status.interfaces[].ipAddress' | tr -d '\"'", "delta": "0:00:00.083057", "end": "2022-05-04 20:58:37.271999", "item": 1, "msg": "non-zero return code", "rc": 5, "start": "2022-05-04 20:58:37.188942", "stderr": "jq: error (at <stdin>:149): Cannot iterate over null (null)", "stderr_lines": ["jq: error (at <stdin>:149): Cannot iterate over null (null)"], "stdout": "", "stdout_lines": [], "vm_id": 0}

Reason for failure:

Anything else we need to know:

@oomichi oomichi added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label May 4, 2022
@oomichi oomichi changed the title packet_ubuntu20-calico-aio is failing on the task packet-ci : Wait for vms to have ipaddress assigned packet_ubuntu20-calico-aio is failing on the task "packet-ci : Wait for vms to have ipaddress assigned" May 4, 2022
@oomichi
Copy link
Contributor Author

oomichi commented May 4, 2022

This seems solved already.

@oomichi oomichi closed this as completed May 4, 2022
@ErikJiang
Copy link
Member

ErikJiang commented Nov 30, 2023

@ErikJiang
Copy link
Member

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Nov 30, 2023
@k8s-ci-robot
Copy link
Contributor

@ErikJiang: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@VannTen
Copy link
Contributor

VannTen commented Dec 2, 2023

I have a theory for that failure: under resource contention, some jobs which provision kubevirt VMs before executing kubespray on it (packet*) fight for resource allocation:
what I mean is that some of the VM for Job A will be placed on a node, other Pending, same for job B (or more), but neither job A nor B can progress (while still taking resources) since not all or their corresponding VM are Ready.

Thus we get in some sort of livelock.

@floryut Does that seems plausible to you ? Could you look at the cluster during a CI runs to confirm that theory ?

If that's the case, there is the Coscheduler plugin which could help us, but that's rather involved, so let's confirm first...

EDIT:
So the root problem was ci namespace not getting properly labelled and hence not deleted : https://kubernetes.slack.com/archives/CDQSC941H/p1701740100644879

@VannTen
Copy link
Contributor

VannTen commented Jan 22, 2024

/close
(until next time....)

@k8s-ci-robot
Copy link
Contributor

@VannTen: Closing this issue.

In response to this:

/close
(until next time....)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ant31
Copy link
Contributor

ant31 commented May 31, 2024

looks like VM can't be scheduled.
I've created a PR to reduce resource request: #11255
Also I'll look at adding an additional CI server later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.
Projects
None yet
5 participants