Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

squid: qa/tasks/mgr/test_progress.py: deal with pre-exisiting pool #58263

Open
wants to merge 1 commit into
base: squid
Choose a base branch
from

Conversation

amathuria
Copy link
Contributor

backport tracker: https://tracker.ceph.com/issues/66689


backport of #57401
parent tracker: https://tracker.ceph.com/issues/65826

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

Problem:
Currently, the test will fail we try
to remove a pool that was not created
by test_progress.py if we use the function
`remove_pool` from class CephManager
because it tracks the pools in a dictionary
and there is an assert to check if the pool
exists in the dictionary.

Solution:
Therefore, add a case where if
there is a pool that was not created
by the test, then we would delete it
through raw command instead of using
`remove_pool` from CephManager

Fixes: https://tracker.ceph.com/issues/65826

Signed-off-by: Kamoltat <[email protected]>
(cherry picked from commit b083552)
@amathuria amathuria added this to the squid milestone Jun 25, 2024
@kamoltat
Copy link
Member

kamoltat commented Jun 25, 2024

FYI @amathuria
I ran a test with the command for safety measures:

teuthology-suite -v --ceph squid --ceph-repo https://github.com/ceph/ceph-ci.git --suite-repo https://github.com/amathuria/ceph.git --suite-branch wip-66689-squid --suite rados:mgr --filter "tasks/progress" --subset 90000/1200000 --machine-type smithi --priority 50 --limit 1 -S 12db92636bffe3e11ee07f5cc48e383f0a3a34e2

https://pulpito.ceph.com/ksirivad-2024-06-25_17:34:30-rados:mgr-squid-distro-default-smithi/7771763/
if this passes then I think we are good to go.

@kamoltat
Copy link
Member

kamoltat commented Jun 25, 2024

Unfortunately the PR failed due to:
https://pulpito.ceph.com/ksirivad-2024-06-25_17:34:30-rados:mgr-squid-distro-default-smithi/7771763/

2024-06-25T17:56:44.950 INFO:tasks.cephfs_test_runner:When a recovery is underway, but then the out OSD
2024-06-25T17:56:44.950 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_amathuria_ceph_7324086c3e30db6662e1ec7ff68acbf8d7d4e657/qa/tasks/mgr/test_progress.py", line 347, in test_osd_came_back
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:    ev2 = self._simulate_back_in([0], ev1)
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_amathuria_ceph_7324086c3e30db6662e1ec7ff68acbf8d7d4e657/qa/tasks/mgr/test_progress.py", line 241, in _simulate_back_in
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:    new_event = self._get_osd_in_out_events('in')[0]
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:IndexError: list index out of range
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:Ran 2 tests in 203.238s
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:FAILED (errors=1)
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:======================================================================
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:ERROR: test_osd_came_back (tasks.mgr.test_progress.TestProgress)
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:When a recovery is underway, but then the out OSD
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_amathuria_ceph_7324086c3e30db6662e1ec7ff68acbf8d7d4e657/qa/tasks/mgr/test_progress.py", line 347, in test_osd_came_back
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:    ev2 = self._simulate_back_in([0], ev1)
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_amathuria_ceph_7324086c3e30db6662e1ec7ff68acbf8d7d4e657/qa/tasks/mgr/test_progress.py", line 241, in _simulate_back_in
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:    new_event = self._get_osd_in_out_events('in')[0]
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:IndexError: list index out of range
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:
2024-06-25T17:56:44.952 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_bc919dbd59aafb86996e91254fe9a142fa2c8ccb/teuthology/run_tasks.py", line 109, in run_tasks
    manager.__enter__()
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_amathuria_ceph_7324086c3e30db6662e1ec7ff68acbf8d7d4e657/qa/tasks/cephfs_test_runner.py", line 211, in task
    raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests)))
RuntimeError: Test failure: test_osd_came_back (tasks.mgr.test_progress.TestProgress)
2024-06-25T17:56:45.117 ERROR:teuthology.util.sentry: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=6b073eff447f490eb1861c3a5f120000
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_bc919dbd59aafb86996e91254fe9a142fa2c8ccb/teuthology/run_tasks.py", line 109, in run_tasks
    manager.__enter__()
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_amathuria_ceph_7324086c3e30db6662e1ec7ff68acbf8d7d4e657/qa/tasks/cephfs_test_runner.py", line 211, in task
    raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests)))
RuntimeError: Test failure: test_osd_came_back (tasks.mgr.test_progress.TestProgress)
2024-06-25T17:56:45.119 DEBUG:teuthology.run_tasks:Unwinding manager cephfs_test_runner
2024-06-25T17:56:45.128 DEBUG:teuthology.run_tasks:Unwinding manager ceph

I'm taking a looking ...

@kamoltat
Copy link
Member

kamoltat commented Jun 25, 2024

I'm rerunning one in main just for a sanity check and to see if the problem is squid-exclusive:

https://pulpito.ceph.com/ksirivad-2024-06-25_20:03:22-rados:mgr-main-distro-default-smithi/7771793/

Edit: hmm this seems to pass, very weird since the only difference between main and squid
test code is the backport commit itself.

[ksirivad@vossi03 ceph]$ git diff main..squid -- qa/tasks/mgr/test_progress.py
diff --git a/qa/tasks/mgr/test_progress.py b/qa/tasks/mgr/test_progress.py
index 948bb2da063..a80600c6a80 100644
--- a/qa/tasks/mgr/test_progress.py
+++ b/qa/tasks/mgr/test_progress.py
@@ -174,15 +174,7 @@ class TestProgress(MgrTestCase):
 
         # Remove all other pools
         for pool in self.mgr_cluster.mon_manager.get_osd_dump_json()['pools']:
-            # There might be some pools that wasn't created with this test.
-            # So we would use a raw cluster command to remove them.
-            pool_name = pool['pool_name']
-            if pool_name in self.mgr_cluster.mon_manager.pools:
-                self.mgr_cluster.mon_manager.remove_pool(pool_name)
-            else:
-                self.mgr_cluster.mon_manager.raw_cluster_cmd(
-                    'osd', 'pool', 'rm', pool_name, pool_name,
-                    "--yes-i-really-really-mean-it")
+            self.mgr_cluster.mon_manager.remove_pool(pool['pool_name'])
 
         self._load_module("progress")
         self.mgr_cluster.mon_manager.raw_cluster_cmd('progress', 'clear')

@kamoltat
Copy link
Member

running 5 more against the main, to make sure we are not have green runs on main with the fix by-chance.
https://pulpito.ceph.com/ksirivad-2024-06-25_21:06:10-rados:mgr-main-distro-default-smithi/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants