On Fri, Jun 3, 2016 at 2:00 AM, Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote: > On Fri, Jun 3, 2016 at 1:28 AM, Abhishek Lekshmanan <abhishek@xxxxxxxx> wrote: >> >> Yehuda Sadeh-Weinraub writes: >> >>> On Thu, Jun 2, 2016 at 6:01 AM, Abhishek Lekshmanan <abhishek@xxxxxxxx> wrote: >>>> [..] >>>> Yehuda Sadeh-Weinraub writes: >>>>> >>>>> Yes, that would be a normal behaviour. The primary should not have >>>>> concurrent sync operations on the same object if object has not >>>>> completed previous sync operations. Looking at the log it really seems >>>>> that we don't identify the concurrent sync operation on the same >>>>> object. This should have been fixed by commit >>>>> edea6d58dd25995bcc1ed4fc5be6f72ce4a6835a. Can you try to verify what >>>>> went wrong there (whether can_do_op() returned true and why)? >>>> >>>> Looked into this a bit, can_do_op() has returned true for the case when >>>> primary issues a Fetch (or GET) and when a delete is issued,(even though >>>> the Fetch is still not complete yet) by putting a debug log around when >>>> we clear the keys, both the delete op and the get op creates and deletes >>>> the same key successfully. >>>> >>>> Which makes me suspect, that different instances of >>>> RGWBucketIncSyncShardMarkerTrack are at play here, leading to different >>>> independent values for key_to_marker. Is that possible? >>>> >>> Shouldn't happen, but maybe something went wrong. Try adding some more >>> info to the log message to see if that's the case. >> >> I just added a debug log whenever a new instance of >> RGWBucketIncSyncShardMarkerTrack was created, and when we check/delete >> keys, in all cases, ie. when a GET was called and/or when a DELETE was >> called, it was a newer instance of marker_tracker that was being invoked. >> Also a few lines before always showed this: >> >> incremental_sync(): async update notification: mybucket:62bc922d-f295-4067-ae36-e23e2f231aad.24099.1:-1 >> >> which seems to be called whenever we're creating a new SingleEntry CR? >> (the value of modified_iter was the same in every case) >> >> Also looking at the cases where the deletion succeeded in the secondary >> zone, it seemed here too can_do_op had succeeded every time, the >> difference was in this case either the Object GET came from the remote >> site after original site had already processed the DELETE or in other >> cases, the GET in remote site was processed in time before the DELETE. >> >> > > Can you provide the log? I'm still not sure how you'd have different > tracker markers for the same bucket instance, as we take a lease to > prevent concurrent updates to the same bucket shard. This should > happen in the async updates too. > Also, try this: https://github.com/yehudasa/ceph/commit/b00096207e5fb2b1d7591a59a8012ec458bcde4b Thanks, Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html