Casey Bodley <cbodley@xxxxxxxxxx> writes: > Summarizing some discussion in the rgw standup related to Abhishek's > work in https://github.com/ceph/ceph/pull/24332, where we didn't quite > reach a consensus. Thanks for the summary, this is really helpful. > > > When resharding starts: > > -create a new_bucket_instance with reshard_status=NONE > > -set current_bucket_instance's reshard_status=IN_PROGRESS and > new_bucket_instance_id > > > On reshard failure: > > -set current_bucket_instance's reshard_status=NONE and clear > new_bucket_instance_id > > > On reshard success: > > -link bucket entrypoint to new_bucket_instance > > -set current_bucket_instance's reshard_status=DONE > > > Given these states, how can we reliably detect whether a given bucket > instance is safe to clean up? That means it either a) successfully > resharded and is no longer the current_bucket_instance, or b) it was the > new_bucket_instance of a failed resharding operation. > > a) has reshard_status=DONE This case is guaranteed to be okay for cleanup. > > b) has reshard_status=NONE, an instance id != current_bucket_instance's > id (ie not linked to the bucket entrypoint), and an instance id != > current_bucket_instance's new_bucket_instance_id (ie not the target of a > reshard operation) > > If radosgw crashes while a reshard is in progress, the > current_bucket_instance will still have a new_bucket_instance_id == > new_bucket_instance's id, so the criteria for b) won't apply and we'd > have to wait for another reshard attempt before we're able to clean it up. > > > There was also concern about whether this cleanup decision could race > with ongoing reshard operations, but I don't think that's the case: a) > is safe because DONE is a terminal state. For b), we know that it can't > be the source of a new reshard operation because it's not the > current_bucket_instance, nor can it be the target of a new reshard. If we read the current bucket entry point and the bucket instance again before marking the condition b as ok to clean up I think we'll cover all the bases of the race condition. What I'm thinking is a scenario like this: - we read the current entry point, bucket hasn't started resharding - so new_bucket_instance = NONE; - the bucket starts resharding; so new_bucket_instance is set (the current value we've read is therefore stale) - scanning yields a new bucket instance but it matches condition b: ie. reshard_status = NONE (as the reshard is in progress) and entrypoint (which we've read previously) doesn't refer this instance id. So for case B we re-read the bucket entry point & instance and recheck the status before we mark this as ok to cleanup, in case of the first case this is not a problem as state=DONE means that this bucket instance id will never be reused again. Am I thinking this right? -- Abhishek