Re: rgw: identifying resharded bucket instances that are safe to clean up

Abhishek Lekshmanan <abhishek@xxxxxxxx> · Mon, 15 Oct 2018 10:59:28 +0200

Casey Bodley <cbodley@xxxxxxxxxx> writes:

> Summarizing some discussion in the rgw standup related to Abhishek's 
> work in https://github.com/ceph/ceph/pull/24332, where we didn't quite 
> reach a consensus.

Thanks for the summary, this is really helpful.
>
>
> When resharding starts:
>
> -create a new_bucket_instance with reshard_status=NONE
>
> -set current_bucket_instance's reshard_status=IN_PROGRESS and 
> new_bucket_instance_id
>
>
> On reshard failure:
>
> -set current_bucket_instance's reshard_status=NONE and clear 
> new_bucket_instance_id
>
>
> On reshard success:
>
> -link bucket entrypoint to new_bucket_instance
>
> -set current_bucket_instance's reshard_status=DONE
>
>
> Given these states, how can we reliably detect whether a given bucket 
> instance is safe to clean up? That means it either a) successfully 
> resharded and is no longer the current_bucket_instance, or b) it was the 
> new_bucket_instance of a failed resharding operation.
>
> a) has reshard_status=DONE

This case is guaranteed to be okay for cleanup.
>
> b) has reshard_status=NONE, an instance id != current_bucket_instance's 
> id (ie not linked to the bucket entrypoint), and an instance id != 
> current_bucket_instance's new_bucket_instance_id (ie not the target of a 
> reshard operation)
>
> If radosgw crashes while a reshard is in progress, the 
> current_bucket_instance will still have a new_bucket_instance_id == 
> new_bucket_instance's id, so the criteria for b) won't apply and we'd 
> have to wait for another reshard attempt before we're able to clean it up.
>
>
> There was also concern about whether this cleanup decision could race 
> with ongoing reshard operations, but I don't think that's the case: a) 
> is safe because DONE is a terminal state. For b), we know that it can't 
> be the source of a new reshard operation because it's not the 
> current_bucket_instance, nor can it be the target of a new reshard.

If we read the current bucket entry point and the bucket instance again
before marking the condition b as ok to clean up I think we'll cover all
the bases of the race condition.

What I'm thinking is a scenario like this:
- we read the current entry point, bucket hasn't started resharding - so
new_bucket_instance = NONE; 
- the bucket starts resharding; so new_bucket_instance is set (the
current value we've read is therefore stale)
- scanning yields a new bucket instance but it matches condition b:
ie. reshard_status = NONE (as the reshard is in progress) and entrypoint
(which we've read previously) doesn't refer this instance id.

So for case B we re-read the bucket entry point & instance and recheck
the status before we mark this as ok to cleanup, in case of the first
case this is not a problem as state=DONE means that this bucket instance
id will never be reused again.

Am I thinking this right?

-- 
Abhishek