Re: rgw: identifying resharded bucket instances that are safe to clean up

"J. Eric Ivancich" <ivancich@xxxxxxxxxx> · Mon, 15 Oct 2018 13:56:57 -0400

On 10/15/18 4:59 AM, Abhishek Lekshmanan wrote:
> If we read the current bucket entry point and the bucket instance again
> before marking the condition b as ok to clean up I think we'll cover all
> the bases of the race condition.
> 
> What I'm thinking is a scenario like this:
> - we read the current entry point, bucket hasn't started resharding - so
> new_bucket_instance = NONE; 
> - the bucket starts resharding; so new_bucket_instance is set (the
> current value we've read is therefore stale)
> - scanning yields a new bucket instance but it matches condition b:
> ie. reshard_status = NONE (as the reshard is in progress) and entrypoint
> (which we've read previously) doesn't refer this instance id.
> 
> So for case B we re-read the bucket entry point & instance and recheck
> the status before we mark this as ok to cleanup, in case of the first
> case this is not a problem as state=DONE means that this bucket instance
> id will never be reused again.

Take a look at my reply to Casey in this thread. We can prevent another
resharding from taking place with a lock. Furthermore, with my PR we can
insure the lock is held continuously and never lost. I think you should
hold the lock during the scan to generate a list of what to delete. Then
you can release the lock and remove the items on the list.

> Am I thinking this right?
> 

Eric