On Tue, 8 May 2018, Aleksei Gutikov wrote: > Hi all. > > Almost all changes to RGWCache<T> calls distribute_cache() which > finally calls librados::IoCtxImpl::notify(). > > notify() waits responses from all rgws watching specific notify.<id> > (from *.rgw.control pool). > > If rgw craches then watcher is alive for osd_client_watch_timeout seconds. > > And if crached rgw do not respond than notify() waits client_notify_timeout > seconds. > > By default client_notify_timeout=10s, and during create bucket it called twice > so create bucket takes 20s if some rgw crached during last > osd_client_watch_timeout period of time. > > 647ce3387312fc683660c1f3c7571c577379be1c > This commit improved this particular behavior (in master) by disabling > distribute_cache() during bucket creation. > > But why librados::IoCtxImpl::notify() waits at all? > CEPH_OSD_OP_NOTIFY is already a linger op and as I understand it means that it > can take long time by design. > And in the same time notification basically is asynchronous. > So why caller of distribute_cache() should wait for updating caches in all > rgws? Because it makes it a useful coordination tool. If you don't want to wait, just fire off the notify asynchronously but don't wait for the result. If you *do* wait, though, then you have a guarantee that all RGWs who have active watches saw the notify. In RGW's case, we're using it for cache invalidation, so you can then tell the user that any subsequent operation they do, regardless of which RGW they hit, will see the effects of their previous operation. If the notify times out, then all bets are off.. you can't make that promise. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html