rgw multisite: revisiting the design of 'async notifications'

Casey Bodley <cbodley@xxxxxxxxxx> · Tue, 30 Mar 2021 13:26:35 -0400

in multisite, these async notifications are http messages that get
periodically broadcast to peer zones as new entries are added to a
shard of the mdlog or datalog. on the destination zones, they serve
two purposes:

* wake up the coroutine that was processing the given log shards, in
case they were sleeping because there was nothing to do the last time
they polled

* for data sync only, these messages also carry the keys of each new
datalog entry so we can trigger sync on the related bucket shards (in
addition to the buckets we're already syncing from the datalog itself)

these notifications have been in since jewel. as i understand it, the
goal was to make replication feel more responsive to updates, but the
model has two major flaws:

* it doesn't scale to more than one gateway per zone. when
broadcasting these notifications, we choose one radosgw endpoint from
each peer zone - but we have no way to know which one of those is
actually processing the log shards we're trying to notify. on receipt,
data sync will cache all of these keys in a map of 'modified_shards',
and the entries will just pile up in memory for the shards it isn't
processing

* it reduces the apparent latency of sync on some buckets at the
expense of overall sync throughput. not only does it prioritize sync
of 'hot' buckets over buckets in the backlog, but for every bucket we
sync via a notification, we'll re-sync it again when we get to its
entry in the log. i don't think this tradeoff is a good one

what does everyone else think? are there other reasons to keep sending these?
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx