squid: qa/tasks/mgr/test_progress.py: deal with pre-exisiting pool #58263

amathuria · 2024-06-25T14:46:37Z

backport tracker: https://tracker.ceph.com/issues/66689

backport of #57401
parent tracker: https://tracker.ceph.com/issues/65826

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

Problem: Currently, the test will fail we try to remove a pool that was not created by test_progress.py if we use the function `remove_pool` from class CephManager because it tracks the pools in a dictionary and there is an assert to check if the pool exists in the dictionary. Solution: Therefore, add a case where if there is a pool that was not created by the test, then we would delete it through raw command instead of using `remove_pool` from CephManager Fixes: https://tracker.ceph.com/issues/65826 Signed-off-by: Kamoltat <[email protected]> (cherry picked from commit b083552)

kamoltat · 2024-06-25T17:35:17Z

FYI @amathuria
I ran a test with the command for safety measures:

teuthology-suite -v --ceph squid --ceph-repo https://github.com/ceph/ceph-ci.git --suite-repo https://github.com/amathuria/ceph.git --suite-branch wip-66689-squid --suite rados:mgr --filter "tasks/progress" --subset 90000/1200000 --machine-type smithi --priority 50 --limit 1 -S 12db92636bffe3e11ee07f5cc48e383f0a3a34e2

https://pulpito.ceph.com/ksirivad-2024-06-25_17:34:30-rados:mgr-squid-distro-default-smithi/7771763/
if this passes then I think we are good to go.

kamoltat · 2024-06-25T18:39:08Z

Unfortunately the PR failed due to:
https://pulpito.ceph.com/ksirivad-2024-06-25_17:34:30-rados:mgr-squid-distro-default-smithi/7771763/

2024-06-25T17:56:44.950 INFO:tasks.cephfs_test_runner:When a recovery is underway, but then the out OSD
2024-06-25T17:56:44.950 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_amathuria_ceph_7324086c3e30db6662e1ec7ff68acbf8d7d4e657/qa/tasks/mgr/test_progress.py", line 347, in test_osd_came_back
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:    ev2 = self._simulate_back_in([0], ev1)
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_amathuria_ceph_7324086c3e30db6662e1ec7ff68acbf8d7d4e657/qa/tasks/mgr/test_progress.py", line 241, in _simulate_back_in
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:    new_event = self._get_osd_in_out_events('in')[0]
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:IndexError: list index out of range
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:Ran 2 tests in 203.238s
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:FAILED (errors=1)
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:======================================================================
2024-06-25T17:56:44.951 INFO:tasks.cephfs_test_runner:ERROR: test_osd_came_back (tasks.mgr.test_progress.TestProgress)
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:When a recovery is underway, but then the out OSD
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_amathuria_ceph_7324086c3e30db6662e1ec7ff68acbf8d7d4e657/qa/tasks/mgr/test_progress.py", line 347, in test_osd_came_back
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:    ev2 = self._simulate_back_in([0], ev1)
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_amathuria_ceph_7324086c3e30db6662e1ec7ff68acbf8d7d4e657/qa/tasks/mgr/test_progress.py", line 241, in _simulate_back_in
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:    new_event = self._get_osd_in_out_events('in')[0]
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:IndexError: list index out of range
2024-06-25T17:56:44.952 INFO:tasks.cephfs_test_runner:
2024-06-25T17:56:44.952 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_bc919dbd59aafb86996e91254fe9a142fa2c8ccb/teuthology/run_tasks.py", line 109, in run_tasks
    manager.__enter__()
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_amathuria_ceph_7324086c3e30db6662e1ec7ff68acbf8d7d4e657/qa/tasks/cephfs_test_runner.py", line 211, in task
    raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests)))
RuntimeError: Test failure: test_osd_came_back (tasks.mgr.test_progress.TestProgress)
2024-06-25T17:56:45.117 ERROR:teuthology.util.sentry: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=6b073eff447f490eb1861c3a5f120000
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_bc919dbd59aafb86996e91254fe9a142fa2c8ccb/teuthology/run_tasks.py", line 109, in run_tasks
    manager.__enter__()
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_amathuria_ceph_7324086c3e30db6662e1ec7ff68acbf8d7d4e657/qa/tasks/cephfs_test_runner.py", line 211, in task
    raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests)))
RuntimeError: Test failure: test_osd_came_back (tasks.mgr.test_progress.TestProgress)
2024-06-25T17:56:45.119 DEBUG:teuthology.run_tasks:Unwinding manager cephfs_test_runner
2024-06-25T17:56:45.128 DEBUG:teuthology.run_tasks:Unwinding manager ceph

I'm taking a looking ...

kamoltat · 2024-06-25T20:04:52Z

I'm rerunning one in main just for a sanity check and to see if the problem is squid-exclusive:

https://pulpito.ceph.com/ksirivad-2024-06-25_20:03:22-rados:mgr-main-distro-default-smithi/7771793/

Edit: hmm this seems to pass, very weird since the only difference between main and squid
test code is the backport commit itself.

[ksirivad@vossi03 ceph]$ git diff main..squid -- qa/tasks/mgr/test_progress.py
diff --git a/qa/tasks/mgr/test_progress.py b/qa/tasks/mgr/test_progress.py
index 948bb2da063..a80600c6a80 100644
--- a/qa/tasks/mgr/test_progress.py
+++ b/qa/tasks/mgr/test_progress.py
@@ -174,15 +174,7 @@ class TestProgress(MgrTestCase):
 
         # Remove all other pools
         for pool in self.mgr_cluster.mon_manager.get_osd_dump_json()['pools']:
-            # There might be some pools that wasn't created with this test.
-            # So we would use a raw cluster command to remove them.
-            pool_name = pool['pool_name']
-            if pool_name in self.mgr_cluster.mon_manager.pools:
-                self.mgr_cluster.mon_manager.remove_pool(pool_name)
-            else:
-                self.mgr_cluster.mon_manager.raw_cluster_cmd(
-                    'osd', 'pool', 'rm', pool_name, pool_name,
-                    "--yes-i-really-really-mean-it")
+            self.mgr_cluster.mon_manager.remove_pool(pool['pool_name'])
 
         self._load_module("progress")
         self.mgr_cluster.mon_manager.raw_cluster_cmd('progress', 'clear')

kamoltat · 2024-06-25T21:21:52Z

running 5 more against the main, to make sure we are not have green runs on main with the fix by-chance.
https://pulpito.ceph.com/ksirivad-2024-06-25_21:06:10-rados:mgr-main-distro-default-smithi/

amathuria added this to the squid milestone Jun 25, 2024

amathuria added the tests label Jun 25, 2024

amathuria requested review from kamoltat and rzarzynski June 25, 2024 17:14

rzarzynski approved these changes Jun 25, 2024

View reviewed changes

neha-ojha added the needs-qa label Jun 25, 2024

kamoltat removed the needs-qa label Jun 25, 2024

neha-ojha added wip-yuri13-testing core labels Jun 25, 2024

neha-ojha removed the wip-yuri13-testing label Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

squid: qa/tasks/mgr/test_progress.py: deal with pre-exisiting pool #58263

squid: qa/tasks/mgr/test_progress.py: deal with pre-exisiting pool #58263

amathuria commented Jun 25, 2024

kamoltat commented Jun 25, 2024 •

edited

Loading

kamoltat commented Jun 25, 2024 •

edited

Loading

kamoltat commented Jun 25, 2024 •

edited

Loading

kamoltat commented Jun 25, 2024

squid: qa/tasks/mgr/test_progress.py: deal with pre-exisiting pool #58263

Are you sure you want to change the base?

squid: qa/tasks/mgr/test_progress.py: deal with pre-exisiting pool #58263

Conversation

amathuria commented Jun 25, 2024

kamoltat commented Jun 25, 2024 • edited Loading

kamoltat commented Jun 25, 2024 • edited Loading

kamoltat commented Jun 25, 2024 • edited Loading

kamoltat commented Jun 25, 2024

kamoltat commented Jun 25, 2024 •

edited

Loading

kamoltat commented Jun 25, 2024 •

edited

Loading

kamoltat commented Jun 25, 2024 •

edited

Loading