Re: rgw multisite: revisiting the design of 'async notifications'

Casey Bodley <cbodley@xxxxxxxxxx> · Wed, 23 Mar 2022 10:34:55 -0400

thanks Yehuda,

On Wed, Mar 23, 2022 at 9:46 AM Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:
>
> On Tue, Mar 22, 2022 at 2:14 PM Adam C. Emerson <aemerson@xxxxxxxxxx> wrote:
> >
> > On 22/03/2022, Matt Benjamin wrote:
> > > Just to be clear, why do we think it doesn't serve as an optimization?
> >
> > My thought being, if we're already saturated with syncing stuff,
> > adding more work on top of it won't help anything.
>
> And what if we're not saturated? You're optimizing the high traffic
> case by killing the low traffic case. If there are specific
> implementation issues then address them,

are you interested in working on this? maybe you could start by
proposing a design that scales to multiple gateways? these issues were
raised last year and nothing changed, so i'd really like to find a way
forward instead of just leaving it on our pile of technical debt

> but I think this is very valuable to some use cases.

can we dig into these use cases? i tend to only think about the DR
case, because that's what i assume the majority of our users want
multisite for. if we're going to make sacrifices in this DR case, they
need to very well-motivated

what exactly are we trying to guarantee with these notifications? by
default, data sync will poll for changes every 20 seconds. is this
*really* not sufficiently responsive? if not, why not? and what is?

under a consistent write workload, rgw will currently broadcast these
notifications every 200ms by default (rgw_data_notify_interval_msec),
which seems excessively spammy to me - especially if data sync is
behind and we don't need the wakeups. if responsiveness on the order
of 5-10 seconds is sufficient, isn't it better to just increase the
polling frequency to match?

>
> Yehuda
>
> >
> > > OTOH, as Yehuda points out, the intended purpose of the async
> > > notifies was to implement polling avoidance--to provide wake-ups to
> > > sync endpoints that might otherwise sleep/idle as replication events
> > > accumulate.  This is a well established design pattern, and if we
> > > remember that the async notifies are duplicating hints, it seems to
> > > make sense.
> >
> > Measuring to see how consequential this is would be legitimate.
> >
> > I can imagine a world where if the primary has an idea what the
> > secondary's polling period is, and there hasn't been much sync
> > activity and the primary knows the secondary won't poll for a while,
> > it might be worthwhile to send a single wakeup event when there's new
> > data available telling it that there's new stuff in the data log.
> >
> > Whether this is worthwhile would depend heavily on how frequently the
> > secondary polls the data log in the first place.
> >
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
> >
>
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
>

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx