Re: rgw: identifying resharded bucket instances that are safe to clean up

Casey Bodley <cbodley@xxxxxxxxxx> · Mon, 15 Oct 2018 15:07:26 -0400

On 10/15/18 1:49 PM, J. Eric Ivancich wrote:
On 10/12/18 3:05 PM, Casey Bodley wrote:
Summarizing some discussion in the rgw standup related to Abhishek's
work in https://github.com/ceph/ceph/pull/24332, where we didn't quite
reach a consensus.

When resharding starts:

-create a new_bucket_instance with reshard_status=NONE
I believe the new bucket instance gets IN_PROGRESS as well.

I do see a call to set_resharding_status(IN_PROGRESS) -> 
RGWRados::bucket_set_reshard(), but that operates on the bucket index 
shards. It's the calls to put_bucket_instance_info() that are modifying 
flags on the bucket instance. In create_new_bucket_instance(), I see it 
writing new_bucket_info.reshard_status = 0; (where 0=NONE).

-set current_bucket_instance's reshard_status=IN_PROGRESS and
new_bucket_instance_id

On reshard failure:

-set current_bucket_instance's reshard_status=NONE and clear
new_bucket_instance_id
That should happen, but isn't happening on master. With my PR it should
do this.

It looks like BucketInfoReshardUpdate's destructor is doing this on 
master. BucketInfoReshardUpdate is on the stack in 
RGWBucketReshard::do_reshard().

On reshard success:

-link bucket entrypoint to new_bucket_instance
It immediately sets the reshard_status to DONE in
RGWBucketReshard::execute and then to NONE (not sure where that last
change happens).

I'm only seeing this first part in BucketInfoReshardUpdate::complete() 
which calls set_status(DONE).
-set current_bucket_instance's reshard_status=DONE

Given these states, how can we reliably detect whether a given bucket
instance is safe to clean up? That means it either a) successfully
resharded and is no longer the current_bucket_instance, or b) it was the
new_bucket_instance of a failed resharding operation.

a) has reshard_status=DONE

b) has reshard_status=NONE, an instance id != current_bucket_instance's
id (ie not linked to the bucket entrypoint), and an instance id !=
current_bucket_instance's new_bucket_instance_id (ie not the target of a
reshard operation)

If radosgw crashes while a reshard is in progress, the
current_bucket_instance will still have a new_bucket_instance_id ==
new_bucket_instance's id, so the criteria for b) won't apply and we'd
have to wait for another reshard attempt before we're able to clean it up.

There was also concern about whether this cleanup decision could race
with ongoing reshard operations, but I don't think that's the case: a)
is safe because DONE is a terminal state. For b), we know that it can't
be the source of a new reshard operation because it's not the
current_bucket_instance, nor can it be the target of a new reshard.

I hope this helps. Am I missing anything?
First, I think the process, in general (i.e., beyond this clean-up)
should be a process:

     a. takes the reshard lock
     b. updates the bucket info status
     c. possibly sets the per-shard status.
     d. DOES WORK
        i. refresh the reshard lock every so often; if it's ever
           lost error out
     e. fixes the per-shard status (i.e., undoes c)
     f. cleans up the bucket info status (i.e., undoes b)
     g. releases the lock (i.e., undoes a)

I think the clean-up code should do something similar. I think it should:

     a. takes the reshard lock with exclusive ephemeral; this
        will prevent a reshard process from interfering
        i. that needs my PR for the added functionality
     b. verifies that the current bucket info has status of NONE
        i. if not, stop and let resharding clean it up
     c. every other bucket info and index shard that's not the
        current one is available for clean-up, create a list of them.
        i. be sure to refresh the lock every so often if necessary
     d. release the reshard lock
        i. if this should return an error code, the lock may have
           been lost; may be safest to abort the effort this time
           around
     e. delete all items in the list generated at c.

Can this fail?

Casey

Eric