Re: Why librados::IoCtxImpl::notify waits for CEPH_OSD_OP_NOTIFY to complete?

Sage Weil <sweil@xxxxxxxxxx> · Tue, 8 May 2018 14:43:41 +0000 (UTC)

On Tue, 8 May 2018, Aleksei Gutikov wrote:
> Hi all.
> 
> Almost all changes to RGWCache<T> calls distribute_cache() which
> finally calls librados::IoCtxImpl::notify().
> 
> notify() waits responses from all rgws watching specific notify.<id>
> (from *.rgw.control pool).
> 
> If rgw craches then watcher is alive for osd_client_watch_timeout seconds.
> 
> And if crached rgw do not respond than notify() waits client_notify_timeout
> seconds.
> 
> By default client_notify_timeout=10s, and during create bucket it called twice
> so create bucket takes 20s if some rgw crached during last
> osd_client_watch_timeout period of time.
> 
> 647ce3387312fc683660c1f3c7571c577379be1c
> This commit improved this particular behavior (in master) by disabling
> distribute_cache() during bucket creation.
> 
> But why librados::IoCtxImpl::notify() waits at all?
> CEPH_OSD_OP_NOTIFY is already a linger op and as I understand it means that it
> can take long time by design.
> And in the same time notification basically is asynchronous.
> So why caller of distribute_cache() should wait for updating caches in all
> rgws?

Because it makes it a useful coordination tool.  If you don't want to 
wait, just fire off the notify asynchronously but don't wait for the 
result.  If you *do* wait, though, then you have a guarantee that all RGWs 
who have active watches saw the notify.  In RGW's case, we're using it for 
cache invalidation, so you can then tell the user that any subsequent 
operation they do, regardless of which RGW they hit, will see the effects 
of their previous operation.

If the notify times out, then all bets are off.. you can't make that 
promise.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html