Re: rgw multisite: revisiting the design of 'async notifications'

Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> · Tue, 29 Mar 2022 15:55:55 -0400

How about this use case:An rgw multisite test suite that checks that objects have been synced to the remote zone before it can continue to the next test.

Wanting to reduce latency shouldn't be controversial. Performance is not just bandwidth.

Yehuda

On Tue, Mar 29, 2022, 2:56 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
hi Yehuda,

On Wed, Mar 23, 2022 at 9:46 AM Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:

>

> On Tue, Mar 22, 2022 at 2:14 PM Adam C. Emerson <aemerson@xxxxxxxxxx> wrote:

> >

> > On 22/03/2022, Matt Benjamin wrote:

> > > Just to be clear, why do we think it doesn't serve as an optimization?

> >

> > My thought being, if we're already saturated with syncing stuff,

> > adding more work on top of it won't help anything.

>

> And what if we're not saturated? You're optimizing the high traffic

> case by killing the low traffic case. If there are specific

> implementation issues then address them, but I think this is very

> valuable to some use cases.

i'm still interested in exploring these use cases, to learn how async

notifications can work with the rest of multisite sync to satisfy them

it sounds like you're interested in use cases with very strict

requirements on the sync delta, given that they demand a 'sensitivity'

on the order of 200ms

however, multisite does asynchronous replication. this means that no

client can expect to read an object on a secondary zone immediately

after writing it to the primary. this replication could be arbitrarily

far behind. ultimately, we can't provide any guarantees about how long

it will take for a given write to replicate

so i'm having a lot of trouble coming up with use cases that are

compatible with async replication, but are also 'killed' when we

replace notifications every 200ms with polling at a 20s interval

if async replication is the problem, we can't expect notifications to

fix it. the client probably wants synchronous replication instead,

which could just mean writing each object to both zones before

completing

if you're still advocating for these notifications, can you please

help to frame the discussion here?

>

> Yehuda

>

> >

> > > OTOH, as Yehuda points out, the intended purpose of the async

> > > notifies was to implement polling avoidance--to provide wake-ups to

> > > sync endpoints that might otherwise sleep/idle as replication events

> > > accumulate.  This is a well established design pattern, and if we

> > > remember that the async notifies are duplicating hints, it seems to

> > > make sense.

> >

> > Measuring to see how consequential this is would be legitimate.

> >

> > I can imagine a world where if the primary has an idea what the

> > secondary's polling period is, and there hasn't been much sync

> > activity and the primary knows the secondary won't poll for a while,

> > it might be worthwhile to send a single wakeup event when there's new

> > data available telling it that there's new stuff in the data log.

> >

> > Whether this is worthwhile would depend heavily on how frequently the

> > secondary polls the data log in the first place.

> >

> > _______________________________________________

> > Dev mailing list -- dev@xxxxxxx

> > To unsubscribe send an email to dev-leave@xxxxxxx

> >

>

> _______________________________________________

> Dev mailing list -- dev@xxxxxxx

> To unsubscribe send an email to dev-leave@xxxxxxx

>

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx