Yeah, I'm also trapped in unordered marker update for some time, and implemented a pending queue alike RGWOmapAppend to solve the issue, but I think the RGWLastCallerWinsCR is more simple one. Casey Bodley <cbodley@xxxxxxxxxx> 于2018年9月7日周五 上午5:20写道: > > > On 09/04/2018 11:18 PM, Xinying Song wrote: > > Hi, Casey: > > > > Our environment is based on luminous. > > > > I'm a little confused about this commit. It adds a new member called > > order_cr to RGWSyncShardMarkerTrack and order_cr will always hold the > > newest udpate-marker cr. But suppose it has held an update-marker cr, > > then a new update-marker cr arrived, it will reduce the ref count of > > the older update-marker cr. Won't that 'put()' behavior lead to the > > destroying of that old update-marker cr? That old update-marker cr may > > be still in processing. > > Despite the possible memory corruption, RADOS will still receive > > multiple dis-ordered write operations, the problem seems to continue > > exists. > > > > Or did I missed some key points? Could you give some tips about this > > fix? Thanks! > > It looks like the magic happens in the while loop of > RGWLastCallerWinsCR::operate(). The use of 'yield call()' there means > that it won't resume until the spawned coroutine completes, so this > prevents us from ever having more than one outstanding write to the marker. > > If a second marker write comes in while the first is still running, it > gets stored in 'cr' until the first call() completes. > > If a third write comes in, it overwrites 'cr' and drops the reference to > the second write. Since the second write hadn't been scheduled yet with > call(), it's perfectly safe to drop the last ref and destroy it. If it > -had- already been scheduled, then 'cr' was reset to nullptr before > call(), and RGWLastCallerWinsCR::call_cr() won't try to drop its ref. > > I hope that helps! > Casey > > > Casey Bodley <cbodley@xxxxxxxxxx> 于2018年9月4日周二 下午10:26写道: > >> > >> On 09/03/2018 05:02 AM, Xinying Song wrote: > >>> Hi, cephers: > >>> > >>> We have been suffering a problem of rgw-multisite. The `radosgw-admin > >>> sync status` sometimes show data shards are behind to peers. If no > >>> more log entries are added to the corresponding shard of peer zone, > >>> i.e. no new writes, sync marker of this shard is stuck on that old > >>> marker and no proceed. Restart rgw daemon can resolve this warning. > >>> > >>> RGW log shows syncmarker in incremental_sync() function has been > >>> updated to peer's newest marker. Gdb shows pending and finish_markers > >>> variables of marker_tracker are empty. (i forget to see syncmarker > >>> variable...) . > >>> > >>> I guess this problem is caused by the non-atomic marker update. Since > >>> update marker is handled by an RGWAsyncPutSystemObj op, those ops may > >>> be dis-ordered when delivered to rados. Maybe we should add an id_tag > >>> attr to ensure this op is atomic. > >>> > >>> This problem is not easy to reproduce in testing enviroment, so I > >>> prefer to ask you guys for some advice first, in case I'm in the wrong > >>> way. > >>> > >>> Thanks. > >> I think Yehuda saw this while testing the cloud sync work, and added a > >> RGWLastCallerWinsCR to guarantee the ordering of marker updates in > >> commit 1034a68fd12687ac81e6afc4718dbc8045648034. Does your branch > >> include that commit, or is it based on luminous? We won't be backporting > >> cloud sync as a feature, but we should probably take that one commit - I > >> opened a ticket for this backport at http://tracker.ceph.com/issues/35539. > >> > >> Thanks, > >> Casey >