Re: rgw multisite: revisiting the design of 'async notifications'

Matt Benjamin <mbenjami@xxxxxxxxxx> · Wed, 23 Mar 2022 14:37:35 -0400

inline

On Wed, Mar 23, 2022 at 2:12 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
>
> On Wed, Mar 23, 2022 at 11:17 AM Matt Benjamin <mbenjami@xxxxxxxxxx> wrote:
> >
> > On Wed, Mar 23, 2022 at 10:38 AM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
> > >
> > > thanks Yehuda,
> > >
> > > On Wed, Mar 23, 2022 at 9:46 AM Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Mar 22, 2022 at 2:14 PM Adam C. Emerson <aemerson@xxxxxxxxxx> wrote:
> > >
> > > under a consistent write workload, rgw will currently broadcast these
> > > notifications every 200ms by default (rgw_data_notify_interval_msec),
> > > which seems excessively spammy to me - especially if data sync is
> > > behind and we don't need the wakeups. if responsiveness on the order
> > > of 5-10 seconds is sufficient, isn't it better to just increase the
> > > polling frequency to match?
> > >
> >
> > As I noted earlier in the thread, continuous polling *is* inconsistent
> > with constantly notifying.  I also agree that broadcasting every 200ms
> > is questionable tuning, and so is the hard-coded 20s polling cycle on
> > the other side.  Again, why doesn't polling activity tend toward
> > quiescence when there is no data change?
>
> ok, so you're suggesting that we add some backoff to data sync's
> polling as it keeps finding nothing to do? in that model, i agree that
> notifications are required if we want to guarantee some degree of
> responsiveness. but is this really a better model?

you make some good points that may mean it isn't for us right now,
though in general, polling avoidance is attractive--but, to be fair,
it's most appropriate when activation is intermittent

>
> the zone sending notifications has very little information about the
> target zone's sync processes. all it has is a list of endpoints that
> it can send messages to. it doesn't know which of those endpoints are
> actually running sync, let alone which one is running sync for a given
> datalog shard's notification. it doesn't even know whether an endpoint
> is really an rgw instance! it may just be a load balancer, in which
> case we can neither broadcast a notification to every rgw, nor can we
> send a notification to any specific rgw. because of this, any
> push-based model is going to be problematic

agree, we would want resolutions to these issues (appearling ones)

>
> even if all remote rgw endpoints are reachable from the source zone,
> we'd still have to broadcast every notification to every remote
> endpoint for this to work. in contrast, the polling only happens in
> the single rgw instance that owns the cls_lock on a given datalog
> shard, so only needs a single http request per polling interval

whatever the algorithm in the end, I think it would be nice to make
it's parameters configurable--rather than #define INTERVAL 20 (sp) in
the code

Matt

>
> >
> > Matt
> >
> >
> > --
> >
> > Matt Benjamin
> > Red Hat, Inc.
> > 315 West Huron Street, Suite 140A
> > Ann Arbor, Michigan 48103
> >
> > http://www.redhat.com/en/technologies/storage
> >
> > tel.  734-821-5101
> > fax.  734-769-8938
> > cel.  734-216-5309
> >
>

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx