Re: rgw-multisite: do we need an atomic option for RGWAsyncPutSystemObj?

Tianshan Qu <qutianshan@xxxxxxxxx> · Sat, 8 Sep 2018 00:34:06 +0800



Yeah, I'm also trapped in unordered marker update for some time, and
implemented a pending queue alike RGWOmapAppend to solve the issue,
but I think the RGWLastCallerWinsCR is more simple one.
Casey Bodley <cbodley@xxxxxxxxxx> 于2018年9月7日周五 上午5:20写道：
>
>
> On 09/04/2018 11:18 PM, Xinying Song wrote:
> > Hi, Casey:
> >
> > Our environment is based on luminous.
> >
> > I'm a little confused about this commit. It adds a new member called
> > order_cr to RGWSyncShardMarkerTrack and order_cr will always hold the
> > newest udpate-marker cr. But suppose it has held an update-marker cr,
> > then a new update-marker cr arrived, it will reduce the ref count of
> > the older update-marker cr. Won't that 'put()' behavior lead to the
> > destroying of that old update-marker cr? That old update-marker cr may
> > be still in processing.
> > Despite the possible memory corruption, RADOS will still receive
> > multiple dis-ordered write operations, the problem seems to continue
> > exists.
> >
> > Or did I missed some key points? Could you give some tips about this
> > fix? Thanks!
>
> It looks like the magic happens in the while loop of
> RGWLastCallerWinsCR::operate(). The use of 'yield call()' there means
> that it won't resume until the spawned coroutine completes, so this
> prevents us from ever having more than one outstanding write to the marker.
>
> If a second marker write comes in while the first is still running, it
> gets stored in 'cr' until the first call() completes.
>
> If a third write comes in, it overwrites 'cr' and drops the reference to
> the second write. Since the second write hadn't been scheduled yet with
> call(), it's perfectly safe to drop the last ref and destroy it. If it
> -had- already been scheduled, then 'cr' was reset to nullptr before
> call(), and RGWLastCallerWinsCR::call_cr() won't try to drop its ref.
>
> I hope that helps!
> Casey
>
> > Casey Bodley <cbodley@xxxxxxxxxx> 于2018年9月4日周二 下午10:26写道：
> >>
> >> On 09/03/2018 05:02 AM, Xinying Song wrote:
> >>> Hi, cephers:
> >>>
> >>> We have been suffering a problem of rgw-multisite.  The `radosgw-admin
> >>> sync status` sometimes show data shards are behind to peers. If no
> >>> more log entries are added to the corresponding shard of peer zone,
> >>> i.e. no new writes, sync marker of this shard is stuck on that old
> >>> marker and no proceed. Restart rgw daemon can resolve this warning.
> >>>
> >>> RGW log shows syncmarker in incremental_sync() function has been
> >>> updated to peer's newest marker. Gdb shows pending and finish_markers
> >>> variables of marker_tracker are empty. (i forget to see syncmarker
> >>> variable...) .
> >>>
> >>> I guess this problem is caused by the non-atomic marker update. Since
> >>> update marker is handled by an RGWAsyncPutSystemObj op, those ops may
> >>> be dis-ordered when delivered to rados. Maybe we should add an id_tag
> >>> attr to ensure this op is atomic.
> >>>
> >>> This problem is not easy to reproduce in testing enviroment, so I
> >>> prefer to ask you guys for some advice first, in case I'm in the wrong
> >>> way.
> >>>
> >>> Thanks.
> >> I think Yehuda saw this while testing the cloud sync work, and added a
> >> RGWLastCallerWinsCR to guarantee the ordering of marker updates in
> >> commit 1034a68fd12687ac81e6afc4718dbc8045648034. Does your branch
> >> include that commit, or is it based on luminous? We won't be backporting
> >> cloud sync as a feature, but we should probably take that one commit - I
> >> opened a ticket for this backport at http://tracker.ceph.com/issues/35539.
> >>
> >> Thanks,
> >> Casey
>