On Fri, Jun 3, 2016 at 1:28 AM, Abhishek Lekshmanan <abhishek@xxxxxxxx> wrote: > > Yehuda Sadeh-Weinraub writes: > >> On Thu, Jun 2, 2016 at 6:01 AM, Abhishek Lekshmanan <abhishek@xxxxxxxx> wrote: >>> [..] >>> Yehuda Sadeh-Weinraub writes: >>>> >>>> Yes, that would be a normal behaviour. The primary should not have >>>> concurrent sync operations on the same object if object has not >>>> completed previous sync operations. Looking at the log it really seems >>>> that we don't identify the concurrent sync operation on the same >>>> object. This should have been fixed by commit >>>> edea6d58dd25995bcc1ed4fc5be6f72ce4a6835a. Can you try to verify what >>>> went wrong there (whether can_do_op() returned true and why)? >>> >>> Looked into this a bit, can_do_op() has returned true for the case when >>> primary issues a Fetch (or GET) and when a delete is issued,(even though >>> the Fetch is still not complete yet) by putting a debug log around when >>> we clear the keys, both the delete op and the get op creates and deletes >>> the same key successfully. >>> >>> Which makes me suspect, that different instances of >>> RGWBucketIncSyncShardMarkerTrack are at play here, leading to different >>> independent values for key_to_marker. Is that possible? >>> >> Shouldn't happen, but maybe something went wrong. Try adding some more >> info to the log message to see if that's the case. > > I just added a debug log whenever a new instance of > RGWBucketIncSyncShardMarkerTrack was created, and when we check/delete > keys, in all cases, ie. when a GET was called and/or when a DELETE was > called, it was a newer instance of marker_tracker that was being invoked. > Also a few lines before always showed this: > > incremental_sync(): async update notification: mybucket:62bc922d-f295-4067-ae36-e23e2f231aad.24099.1:-1 > > which seems to be called whenever we're creating a new SingleEntry CR? > (the value of modified_iter was the same in every case) > > Also looking at the cases where the deletion succeeded in the secondary > zone, it seemed here too can_do_op had succeeded every time, the > difference was in this case either the Object GET came from the remote > site after original site had already processed the DELETE or in other > cases, the GET in remote site was processed in time before the DELETE. > > Can you provide the log? I'm still not sure how you'd have different tracker markers for the same bucket instance, as we take a lease to prevent concurrent updates to the same bucket shard. This should happen in the async updates too. Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html