-
Notifications
You must be signed in to change notification settings - Fork 1
inter test cleanup #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
inter test cleanup #340
Conversation
| for sg in sg_list: | ||
| await wait_for_admin_ready(sg) | ||
|
|
||
| db_names = await sg.get_all_database_names() | ||
| for db_name in db_names: | ||
| db_status = await sg.get_database_status(db_name) | ||
| if db_status is not None: | ||
| await sg.delete_database(db_name) | ||
|
|
||
| retries = 10 | ||
| for db_name in db_names: | ||
| for _ in range(retries): | ||
| db_status = await sg.get_database_status(db_name) | ||
| if db_status is None: | ||
| break | ||
| await asyncio.sleep(6) | ||
| else: | ||
| raise TimeoutError(f"Database {db_name} did not delete after 1min") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to delete databases on all Sync Gateway nodes. Calling DELETE /db/ on a node will synchronously drop the database on that node.
This does the following in this order:
- Delete the config from the bucket.
- Remove the database from this node.
Deleting the configuration from the bucket will cause other nodes to drop the database. Each SG has a concept of "config polling" which looks at all buckets and decides whether it needs to add, update, or remove a database.
If you wanted to remove database from a node (which I don't think you do) you would do:
- DELETE /db/ from node1 synchronously
- call retry loop on other nodes to get database(s) and make sure they are deleted.
| async def wait_for_admin_ready( | ||
| sg: "SyncGateway", retries: int = 10, delay: float = 1.0 | ||
| ) -> None: | ||
| for _ in range(retries): | ||
| try: | ||
| await sg.get_version() | ||
| return | ||
| except Exception: | ||
| await asyncio.sleep(delay) | ||
| raise TimeoutError("Sync Gateway admin API did not become ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this function because we don't know what the error is if it returned one.
But I don't see how this function is relevant unless you are restarting a SG node, which doesn't seem like any of these tests are restarting Sync Gateway. Sync Gateway should always be running when this occurs.
| return Path(script_path, "..", "..", "dataset", "sg") | ||
|
|
||
|
|
||
| async def cleanup_test_resources( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do want to drop all the buckets between tests. For these tests, we don't want to do this at the end of a test, because if the test fails in the middle, it won't clean up tests.
IIRC, couchbase server can support about 10 buckets, but we don't want to use memory on CBS/SG at all.
We want to write this probably as a test fixture like (end of test, see below):
@pytest.fixture
def cleanup_test_resources(cblpytest: CBLPyTest) {
yield
...
cleanup code
}
We have two options for running a fixture like this, either at the end of a test to cleanup (this works OK for tests) or a different approach at the start of a test. It might even make sense to do both.
Start of test:
Pro: It will clean up even if pytest crashes or you exit pytest or you have manually created a bucket with the same name
Con: It might make it seem like some tests are taking longer than expected
End of test:
Pro: You'll get failure results faster, but there overall test time for multiple tests remains the same
Con: If pytest exits, you can be left in a stale state. If you wanted to inspect the state after a failed running when running with --exitfirst
Given this, I actually think start of test is a better approach, although you can do both to run before and after since I think that with some modifications below to the code it will be very quick to run if cleanup has already occured.
| sgs: "list[SyncGateway] | SyncGateway", | ||
| cbs_servers: "list[CouchbaseServer] | CouchbaseServer", | ||
| bucket_names: list[str] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there's a test that needs specific SGs, I think for the fixture this should just read the sync gateways and couchbase servers out of cblpytest
|
|
||
| if bucket_names: | ||
| cbs_list = cbs_servers if isinstance(cbs_servers, list) else [cbs_servers] | ||
| for cbs in cbs_list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two cases that I don't know how we distinguish based on earlier PRs.
Are these going to be two separate clusters? Or different nodes within the same CBS cluster?
For each cluster, you would need to drop buckets but if they represent different clusters, you do not. The test TestMultipleServers seems to be one cluster but TestReplicationXdcr seems like two clusters.
| for cbs in cbs_list: | ||
| for bucket_name in bucket_names: | ||
| cbs.drop_bucket(bucket_name) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After dropping the bucket, you want to create a retry loop. I highly recommend using tenacity library so you aren't rolling your own retries.
Example:
@tenacity.retry(
wait=tenacity.wait_fixed(1), # poll every second, this is a cheap call
stop_after=stop_after_attempt(60), # maximum of 60 attempts, I can't see any way this would take this long, but also no reason not to wait a long time
)
async def wait_for_no_sg_databases(sg):
// log that we are waiting for all, Sync Gateway databases to be taken offline via config polling
assert len(sg.get_all_database_names())
In this case call this function will result in this being called every 1 second for 60 attempts. These numbers I picked are arbitrary. If it fails on the last attempt, that exception will be raised. In this case pytest will fail.
torcolvin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, I think we need to cleanup the buckets before and after.
If you have separate bucket names, which we already do for the case of tests that use configure_dataset, cleaning only at the start of the test means that these buckets will remain for the remaining tests. I think we should use the fixture and the general approach described but I think as a stop gap you can do a finalizer https://docs.pytest.org/en/stable/how-to/fixtures.html#adding-finalizers-directly. Not that this document recommends using a fixture:
await cleanup_test_resources(sg, cbs, [bucket_name])
request.addfinalizer(lambda: await cleanup_test_resources(sg, cbs, [bucket_name]))
Doing this stopgap solution will make sure that the tests run OK with a whole test run and unblock anyone who is running the whole test suite.
I think creating a separate ticket/PR for general bucket cleanup as part of this test suite is appropriate.
Added a proper cleanup function (enhanced as to what was in the test_multiple_servers.py). Added it in conftest.py to be globally accessible.
And calling it at the start of every test in order to have fresh set of resources when the test goes forward.
There are some race conditions I had to handle for the same since sometimes the SGW is in a transitional state where after the cleanup call the databases are still being deleted and the test goes forward. So added preventive measures for that as well.