inter test cleanup #340

vipbhardwaj · 2025-12-17T16:29:31Z

Added a proper cleanup function (enhanced as to what was in the test_multiple_servers.py). Added it in conftest.py to be globally accessible.
And calling it at the start of every test in order to have fresh set of resources when the test goes forward.
There are some race conditions I had to handle for the same since sometimes the SGW is in a transitional state where after the cleanup call the databases are still being deleted and the test goes forward. So added preventive measures for that as well.

torcolvin · 2025-12-18T14:28:56Z

tests/QE/conftest.py

+    for sg in sg_list:
+        await wait_for_admin_ready(sg)
+
+        db_names = await sg.get_all_database_names()
+        for db_name in db_names:
+            db_status = await sg.get_database_status(db_name)
+            if db_status is not None:
+                await sg.delete_database(db_name)
+
+        retries = 10
+        for db_name in db_names:
+            for _ in range(retries):
+                db_status = await sg.get_database_status(db_name)
+                if db_status is None:
+                    break
+                await asyncio.sleep(6)
+            else:
+                raise TimeoutError(f"Database {db_name} did not delete after 1min")


We don't need to delete databases on all Sync Gateway nodes. Calling DELETE /db/ on a node will synchronously drop the database on that node.

This does the following in this order:

Delete the config from the bucket.

Remove the database from this node.

Deleting the configuration from the bucket will cause other nodes to drop the database. Each SG has a concept of "config polling" which looks at all buckets and decides whether it needs to add, update, or remove a database.

If you wanted to remove database from a node (which I don't think you do) you would do:

DELETE /db/ from node1 synchronously

call retry loop on other nodes to get database(s) and make sure they are deleted.

torcolvin · 2025-12-18T15:43:36Z

tests/QE/conftest.py

+async def wait_for_admin_ready(
+    sg: "SyncGateway", retries: int = 10, delay: float = 1.0
+) -> None:
+    for _ in range(retries):
+        try:
+            await sg.get_version()
+            return
+        except Exception:
+            await asyncio.sleep(delay)
+    raise TimeoutError("Sync Gateway admin API did not become ready")


I don't like this function because we don't know what the error is if it returned one.

But I don't see how this function is relevant unless you are restarting a SG node, which doesn't seem like any of these tests are restarting Sync Gateway. Sync Gateway should always be running when this occurs.

torcolvin · 2025-12-18T15:59:13Z

tests/QE/conftest.py

    return Path(script_path, "..", "..", "dataset", "sg")
+
+
+async def cleanup_test_resources(


I think we do want to drop all the buckets between tests. For these tests, we don't want to do this at the end of a test, because if the test fails in the middle, it won't clean up tests.

IIRC, couchbase server can support about 10 buckets, but we don't want to use memory on CBS/SG at all.

We want to write this probably as a test fixture like (end of test, see below):

@pytest.fixture def cleanup_test_resources(cblpytest: CBLPyTest) { yield ... cleanup code }

We have two options for running a fixture like this, either at the end of a test to cleanup (this works OK for tests) or a different approach at the start of a test. It might even make sense to do both.

Start of test:

Pro: It will clean up even if pytest crashes or you exit pytest or you have manually created a bucket with the same name
Con: It might make it seem like some tests are taking longer than expected

End of test:

Pro: You'll get failure results faster, but there overall test time for multiple tests remains the same
Con: If pytest exits, you can be left in a stale state. If you wanted to inspect the state after a failed running when running with --exitfirst

Given this, I actually think start of test is a better approach, although you can do both to run before and after since I think that with some modifications below to the code it will be very quick to run if cleanup has already occured.

torcolvin · 2025-12-18T15:59:59Z

tests/QE/conftest.py

+    sgs: "list[SyncGateway] | SyncGateway",
+    cbs_servers: "list[CouchbaseServer] | CouchbaseServer",
+    bucket_names: list[str] | None = None,


Unless there's a test that needs specific SGs, I think for the fixture this should just read the sync gateways and couchbase servers out of cblpytest

torcolvin · 2025-12-18T16:05:12Z

tests/QE/conftest.py

+
+    if bucket_names:
+        cbs_list = cbs_servers if isinstance(cbs_servers, list) else [cbs_servers]
+        for cbs in cbs_list:


There are two cases that I don't know how we distinguish based on earlier PRs.

Are these going to be two separate clusters? Or different nodes within the same CBS cluster?

For each cluster, you would need to drop buckets but if they represent different clusters, you do not. The test TestMultipleServers seems to be one cluster but TestReplicationXdcr seems like two clusters.

torcolvin · 2025-12-18T16:19:03Z

tests/QE/conftest.py

+        for cbs in cbs_list:
+            for bucket_name in bucket_names:
+                cbs.drop_bucket(bucket_name)
+


After dropping the bucket, you want to create a retry loop. I highly recommend using tenacity library so you aren't rolling your own retries.

Example:

@tenacity.retry( wait=tenacity.wait_fixed(1), # poll every second, this is a cheap call stop_after=stop_after_attempt(60), # maximum of 60 attempts, I can't see any way this would take this long, but also no reason not to wait a long time ) async def wait_for_no_sg_databases(sg): // log that we are waiting for all, Sync Gateway databases to be taken offline via config polling assert len(sg.get_all_database_names())

In this case call this function will result in this being called every 1 second for 60 attempts. These numbers I picked are arbitrary. If it fails on the last attempt, that exception will be raised. In this case pytest will fail.

torcolvin

Thinking about this more, I think we need to cleanup the buckets before and after.

If you have separate bucket names, which we already do for the case of tests that use configure_dataset, cleaning only at the start of the test means that these buckets will remain for the remaining tests. I think we should use the fixture and the general approach described but I think as a stop gap you can do a finalizer https://docs.pytest.org/en/stable/how-to/fixtures.html#adding-finalizers-directly. Not that this document recommends using a fixture:

await cleanup_test_resources(sg, cbs, [bucket_name])
request.addfinalizer(lambda: await cleanup_test_resources(sg, cbs, [bucket_name]))

Doing this stopgap solution will make sure that the tests run OK with a whole test run and unblock anyone who is running the whole test suite.

I think creating a separate ticket/PR for general bucket cleanup as part of this test suite is appropriate.

inter test cleanup

786f1d9

vipbhardwaj requested review from bbrks, borrrden and torcolvin December 17, 2025 16:29

torcolvin reviewed Dec 18, 2025

View reviewed changes

torcolvin assigned vipbhardwaj Dec 18, 2025

added differing names for buckets + db

a760700

torcolvin reviewed Dec 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

inter test cleanup #340

inter test cleanup #340

Uh oh!

vipbhardwaj commented Dec 17, 2025

Uh oh!

torcolvin Dec 18, 2025

Uh oh!

torcolvin Dec 18, 2025

Uh oh!

torcolvin Dec 18, 2025

Uh oh!

torcolvin Dec 18, 2025

Uh oh!

torcolvin Dec 18, 2025

Uh oh!

torcolvin Dec 18, 2025

Uh oh!

torcolvin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return Path(script_path, "..", "..", "dataset", "sg")


		async def cleanup_test_resources(

inter test cleanup #340

Are you sure you want to change the base?

inter test cleanup #340

Uh oh!

Conversation

vipbhardwaj commented Dec 17, 2025

Uh oh!

torcolvin Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

torcolvin Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

torcolvin Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

torcolvin Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

torcolvin Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

torcolvin Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

torcolvin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants