-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] [test-only] Test for controller ha #15321
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: skonto The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #15321 +/- ##
==========================================
+ Coverage 84.76% 84.80% +0.03%
==========================================
Files 218 218
Lines 13504 13504
==========================================
+ Hits 11447 11452 +5
+ Misses 1690 1686 -4
+ Partials 367 366 -1 ☔ View full report in Codecov by Sentry. |
/test istio-latest-no-mesh |
The lease seems not being updated after pod restart, although it has been released. logs here and here downloaded-logs-20240611-000442.json I will try to print the leases after the restart to verify it.
|
/test istio-latest-no-mesh |
5 similar comments
/test istio-latest-no-mesh |
/test istio-latest-no-mesh |
/test istio-latest-no-mesh |
/test istio-latest-no-mesh |
/test istio-latest-no-mesh |
@@ -113,7 +113,8 @@ do | |||
if (( ! HTTPS )); then | |||
restart_pod_n "${SYSTEM_NAMESPACE}" "app=controller" | |||
fi | |||
kubectl get leases -n "${SYSTEM_NAMESPACE}" -ojson | grep holder | |||
sleep 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After adding this delay here I cant reproduce it although does not seem related.
/test istio-latest-no-mesh |
Reproduced it again but missed lease printing due to wrong ns, re-running. 🤞 |
/test istio-latest-no-mesh |
4 similar comments
/test istio-latest-no-mesh |
/test istio-latest-no-mesh |
/test istio-latest-no-mesh |
/test istio-latest-no-mesh |
Obviously not all leases are updated after pod restarting: Looking at the downloaded-logs-20240615-204712.json it seems that the lock is never released
cc @dprotaso |
/test istio-latest-no-mesh |
2 similar comments
/test istio-latest-no-mesh |
/test istio-latest-no-mesh |
@skonto: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Last run here shows that: After restarting we have,
Controller pods have 9 reconcilers as expected: chaosduck kills one of the controllers However before that and before we update the cm chaos is also deleting pods leaving leases not cleaned up:
The controller pod created before the ones above will run leader election on the certificate reconciler as it probably runs before cm update: However soon enough it stops leading: As chaos kicks in again: However not all leases are released since the last update fails with: |
It seems go client does not clean up everything. |
A lot of controller pods are being created due to chaos: Full list downloaded-logs-20240618-194319.csv. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Proposed Changes
TestControllerHA
is flakey #15238Release Note