Re: rgw: identifying resharded bucket instances that are safe to clean up

"J. Eric Ivancich" <ivancich@xxxxxxxxxx> · Mon, 15 Oct 2018 13:49:39 -0400

On 10/12/18 3:05 PM, Casey Bodley wrote:
> Summarizing some discussion in the rgw standup related to Abhishek's
> work in https://github.com/ceph/ceph/pull/24332, where we didn't quite
> reach a consensus.
> 
> 
> When resharding starts:
> 
> -create a new_bucket_instance with reshard_status=NONE

I believe the new bucket instance gets IN_PROGRESS as well.

> -set current_bucket_instance's reshard_status=IN_PROGRESS and
> new_bucket_instance_id
> 
> 
> On reshard failure:
> 
> -set current_bucket_instance's reshard_status=NONE and clear
> new_bucket_instance_id

That should happen, but isn't happening on master. With my PR it should
do this.

> On reshard success:
> 
> -link bucket entrypoint to new_bucket_instance

It immediately sets the reshard_status to DONE in
RGWBucketReshard::execute and then to NONE (not sure where that last
change happens).

> -set current_bucket_instance's reshard_status=DONE
> 
> 
> Given these states, how can we reliably detect whether a given bucket
> instance is safe to clean up? That means it either a) successfully
> resharded and is no longer the current_bucket_instance, or b) it was the
> new_bucket_instance of a failed resharding operation.
> 
> a) has reshard_status=DONE
> 
> b) has reshard_status=NONE, an instance id != current_bucket_instance's
> id (ie not linked to the bucket entrypoint), and an instance id !=
> current_bucket_instance's new_bucket_instance_id (ie not the target of a
> reshard operation)
> 
> 
> If radosgw crashes while a reshard is in progress, the
> current_bucket_instance will still have a new_bucket_instance_id ==
> new_bucket_instance's id, so the criteria for b) won't apply and we'd
> have to wait for another reshard attempt before we're able to clean it up.
> 
> 
> There was also concern about whether this cleanup decision could race
> with ongoing reshard operations, but I don't think that's the case: a)
> is safe because DONE is a terminal state. For b), we know that it can't
> be the source of a new reshard operation because it's not the
> current_bucket_instance, nor can it be the target of a new reshard.
> 
> 
> I hope this helps. Am I missing anything?

First, I think the process, in general (i.e., beyond this clean-up)
should be a process:

    a. takes the reshard lock
    b. updates the bucket info status
    c. possibly sets the per-shard status.
    d. DOES WORK
       i. refresh the reshard lock every so often; if it's ever
          lost error out
    e. fixes the per-shard status (i.e., undoes c)
    f. cleans up the bucket info status (i.e., undoes b)
    g. releases the lock (i.e., undoes a)

I think the clean-up code should do something similar. I think it should:

    a. takes the reshard lock with exclusive ephemeral; this
       will prevent a reshard process from interfering
       i. that needs my PR for the added functionality
    b. verifies that the current bucket info has status of NONE
       i. if not, stop and let resharding clean it up
    c. every other bucket info and index shard that's not the
       current one is available for clean-up, create a list of them.
       i. be sure to refresh the lock every so often if necessary
    d. release the reshard lock
       i. if this should return an error code, the lock may have
          been lost; may be safest to abort the effort this time
          around
    e. delete all items in the list generated at c.

Can this fail?

> Casey
> 

Eric