Re: RGW blocking on large objects

Matt Benjamin <mbenjami@xxxxxxxxxx> · Thu, 17 Oct 2019 16:05:34 -0400

My impression is that running a second gateway (assuming 1 at present)
on the same host would be preferable to running one with very high
thread count, also that 1024 is a good maximum value for thread count.

Matt

On Thu, Oct 17, 2019 at 4:01 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>
> On Thu, Oct 17, 2019 at 11:46 AM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
> >
> >
> > On 10/17/19 12:59 PM, Robert LeBlanc wrote:
> > > On Thu, Oct 17, 2019 at 9:22 AM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
> > >
> > >> With respect to this issue, civetweb and beast should behave the same.
> > >> Both frontends have a large thread pool, and their calls to
> > >> process_request() run synchronously (including blocking on rados
> > >> requests) on a frontend thread. So once there are more concurrent client
> > >> connections than there are frontend threads, new connections will block
> > >> until there's a thread available to service them.
> > > Okay, this really helps me understand what's going on here. Is there
> > > plans to remove the synchronous calls and make them async or improve
> > > this flow a bit?
> >
> > Absolutely yes, this work has been in progress for a long time now, and
> > octopus does get a lot of concurrency here. Eventually, all of
> > process_request() will be async-enabled, and we'll be able to run beast
> > with a much smaller thread pool.
>
> This is great news. Anything we can do to help in this effort as it is
> very important for us?
>
> > > Currently I'm seeing 1024 max concurrent ops and 512 thread pool. Does
> > > this mean that on an equally distributed requests that one op could be
> > > processing on the backend RADOS with another queued behind it waiting?
> > > Is this done in round robin fashion so for 99% small io, a very long
> > > RADOS request can get many IO blocked behind it because it is being
> > > round-robin dispatched to the thread pool? (I assume the latter is
> > > what I'm seeing).
> > >
> > > rgw_max_concurrent_requests                                1024
> > > rgw_thread_pool_size                                       512
> > >
> > > If I match the two, do you think it would help prevent small IO from
> > > being blocked by larger IO?
> > rgw_max_concurrent_requests was added in support of the beast/async
> > work, precisely because (post-Nautilus) the number of beast threads will
> > no longer limit the number of concurrent requests. This variable is what
> > throttles incoming requests to prevent radosgw's resource consumption
> > from ballooning under heavy workload. And unlike the existing model
> > where a request remains in the queue until a thread is ready to service
> > it, any requests that exceed rgw_max_concurrent_requests will be
> > rejected with '503 SlowDown' in s3 or '498 Rate Limited' in swift.
> >
> > With respect to prioritization, there isn't any by default but we do
> > have a prototype request scheduler that uses dmclock to prioritize
> > requests based on some hard-coded request classes. It's not especially
> > useful in its current form, but we do have plans to further elaborate
> > the classes and eventually pass the information down to osds for
> > integrated QOS.
> >
> > As of nautilus, though, the thread pool size is the only effective knob
> > you have.
>
> Do you see any problems with running 2k-4k threads if we have the RAM to do so?
>
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx