Re: RGW Multisite delete wierdness

Abhishek Lekshmanan <abhishek@xxxxxxxx> · Fri, 03 Jun 2016 11:16:12 +0200

Yehuda Sadeh-Weinraub writes:

> On Fri, Jun 3, 2016 at 1:28 AM, Abhishek Lekshmanan <abhishek@xxxxxxxx> wrote:
>>
>> Yehuda Sadeh-Weinraub writes:
>>
>>> On Thu, Jun 2, 2016 at 6:01 AM, Abhishek Lekshmanan <abhishek@xxxxxxxx> wrote:
>>>> [..]
>>>> Yehuda Sadeh-Weinraub writes:
>>>>>
>>>>> Yes, that would be a normal behaviour. The primary should not have
>>>>> concurrent sync operations on the same object if object has not
>>>>> completed previous sync operations. Looking at the log it really seems
>>>>> that we don't identify the concurrent sync operation on the same
>>>>> object. This should have been fixed by commit
>>>>> edea6d58dd25995bcc1ed4fc5be6f72ce4a6835a. Can you try to verify what
>>>>> went wrong there (whether can_do_op() returned true and why)?
>>>>
>>>> Looked into this a bit, can_do_op() has returned true for the case when
>>>> primary issues a Fetch (or GET) and when a delete is issued,(even though
>>>> the Fetch is still not complete yet) by putting a debug log around when
>>>> we clear the keys, both the delete op and the get op creates and deletes
>>>> the same key successfully.
>>>>
>>>> Which makes me suspect, that different instances of
>>>> RGWBucketIncSyncShardMarkerTrack are at play here, leading to different
>>>> independent values for key_to_marker. Is that possible?
>>>>
>>> Shouldn't happen, but maybe something went wrong. Try adding some more
>>> info to the log message to see if that's the case.
>>
>> I just added a debug log whenever a new instance of
>> RGWBucketIncSyncShardMarkerTrack was created, and when we check/delete
>> keys, in all cases, ie. when a GET was called and/or when a DELETE was
>> called, it was a newer instance of marker_tracker that was being invoked.
>> Also a few lines before always showed this:
>>
>> incremental_sync(): async update notification: mybucket:62bc922d-f295-4067-ae36-e23e2f231aad.24099.1:-1
>>
>> which seems to be called whenever we're creating a new SingleEntry CR?
>> (the value of modified_iter was the same in every case)
>>
>> Also looking at the cases where the deletion succeeded in the secondary
>> zone, it seemed here too can_do_op had succeeded every time, the
>> difference was in this case either the Object GET came from the remote
>> site after original site had already processed the DELETE or in other
>> cases, the GET in remote site was processed in time before the DELETE.
>>
>>
>
> Can you provide the log? I'm still not sure how you'd have different
> tracker markers for the same bucket instance, as we take a lease to
> prevent concurrent updates to the same bucket shard. This should
> happen in the async updates too.

id: c24eed72-47d4-452c-8d0e-86d96be8fff1

radosgw.8001.log is the master (and in this case the remote site)

>
> Yehuda

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html