Re: rgw-multisite: do we need an atomic option for RGWAsyncPutSystemObj?

Casey Bodley <cbodley@xxxxxxxxxx> · Thu, 6 Sep 2018 15:41:03 -0400

On 09/04/2018 11:18 PM, Xinying Song wrote:
Hi, Casey:

Our environment is based on luminous.

I'm a little confused about this commit. It adds a new member called
order_cr to RGWSyncShardMarkerTrack and order_cr will always hold the
newest udpate-marker cr. But suppose it has held an update-marker cr,
then a new update-marker cr arrived, it will reduce the ref count of
the older update-marker cr. Won't that 'put()' behavior lead to the
destroying of that old update-marker cr? That old update-marker cr may
be still in processing.
Despite the possible memory corruption, RADOS will still receive
multiple dis-ordered write operations, the problem seems to continue
exists.

Or did I missed some key points? Could you give some tips about this
fix? Thanks!

It looks like the magic happens in the while loop of 
RGWLastCallerWinsCR::operate(). The use of 'yield call()' there means 
that it won't resume until the spawned coroutine completes, so this 
prevents us from ever having more than one outstanding write to the marker.

If a second marker write comes in while the first is still running, it 
gets stored in 'cr' until the first call() completes.

If a third write comes in, it overwrites 'cr' and drops the reference to 
the second write. Since the second write hadn't been scheduled yet with 
call(), it's perfectly safe to drop the last ref and destroy it. If it 
-had- already been scheduled, then 'cr' was reset to nullptr before 
call(), and RGWLastCallerWinsCR::call_cr() won't try to drop its ref.

I hope that helps!
Casey

Casey Bodley <cbodley@xxxxxxxxxx> 于2018年9月4日周二 下午10:26写道：

On 09/03/2018 05:02 AM, Xinying Song wrote:
Hi, cephers:

We have been suffering a problem of rgw-multisite.  The `radosgw-admin
sync status` sometimes show data shards are behind to peers. If no
more log entries are added to the corresponding shard of peer zone,
i.e. no new writes, sync marker of this shard is stuck on that old
marker and no proceed. Restart rgw daemon can resolve this warning.

RGW log shows syncmarker in incremental_sync() function has been
updated to peer's newest marker. Gdb shows pending and finish_markers
variables of marker_tracker are empty. (i forget to see syncmarker
variable...) .

I guess this problem is caused by the non-atomic marker update. Since
update marker is handled by an RGWAsyncPutSystemObj op, those ops may
be dis-ordered when delivered to rados. Maybe we should add an id_tag
attr to ensure this op is atomic.

This problem is not easy to reproduce in testing enviroment, so I
prefer to ask you guys for some advice first, in case I'm in the wrong
way.

Thanks.
I think Yehuda saw this while testing the cloud sync work, and added a
RGWLastCallerWinsCR to guarantee the ordering of marker updates in
commit 1034a68fd12687ac81e6afc4718dbc8045648034. Does your branch
include that commit, or is it based on luminous? We won't be backporting
cloud sync as a feature, but we should probably take that one commit - I
opened a ticket for this backport at http://tracker.ceph.com/issues/35539.

Thanks,
Casey