rgw: identifying resharded bucket instances that are safe to clean up

Casey Bodley <cbodley@xxxxxxxxxx> · Fri, 12 Oct 2018 15:05:04 -0400

Summarizing some discussion in the rgw standup related to Abhishek's 
work in https://github.com/ceph/ceph/pull/24332, where we didn't quite 
reach a consensus.

When resharding starts:

-create a new_bucket_instance with reshard_status=NONE

-set current_bucket_instance's reshard_status=IN_PROGRESS and 
new_bucket_instance_id

On reshard failure:

-set current_bucket_instance's reshard_status=NONE and clear 
new_bucket_instance_id

On reshard success:

-link bucket entrypoint to new_bucket_instance

-set current_bucket_instance's reshard_status=DONE

Given these states, how can we reliably detect whether a given bucket 
instance is safe to clean up? That means it either a) successfully 
resharded and is no longer the current_bucket_instance, or b) it was the 
new_bucket_instance of a failed resharding operation.

a) has reshard_status=DONE

b) has reshard_status=NONE, an instance id != current_bucket_instance's 
id (ie not linked to the bucket entrypoint), and an instance id != 
current_bucket_instance's new_bucket_instance_id (ie not the target of a 
reshard operation)

If radosgw crashes while a reshard is in progress, the 
current_bucket_instance will still have a new_bucket_instance_id == 
new_bucket_instance's id, so the criteria for b) won't apply and we'd 
have to wait for another reshard attempt before we're able to clean it up.

There was also concern about whether this cleanup decision could race 
with ongoing reshard operations, but I don't think that's the case: a) 
is safe because DONE is a terminal state. For b), we know that it can't 
be the source of a new reshard operation because it's not the 
current_bucket_instance, nor can it be the target of a new reshard.

I hope this helps. Am I missing anything?
Casey