Re: RGW blocking on large objects

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Thu, 17 Oct 2019 07:58:34 -0700

On Thu, Oct 17, 2019 at 2:50 AM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
>
> On Thu, Oct 17, 2019 at 12:17 AM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Oct 16, 2019 at 2:50 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
> > >
> > > On Wed, Oct 16, 2019 at 11:23 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Oct 15, 2019 at 8:05 AM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, Oct 14, 2019 at 2:58 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
> > > > > >
> > > > > > Could the 4 GB GET limit saturate the connection from rgw to Ceph?
> > > > > > Simple to test: just rate-limit the health check GET
> > > > >
> > > > > I don't think so, we have dual 25Gbp in a LAG, so Ceph to RGW has
> > > > > multiple paths, but we aren't balancing on port yet, so RGW to HAProxy
> > > > > is probably limited to one link.
> > > > >
> > > > > > Did you increase "objecter inflight ops" and "objecter inflight op bytes"?
> > > > > > You absolutely should adjust these settings for large RGW setups,
> > > > > > defaults of 1024 and 100 MB are way too low for many RGW setups, we
> > > > > > default to 8192 and 800MB
> > > >
> > > > On Nautilus the defaults already seem to be:
> > > > objecter_inflight_op_bytes                                 104857600
> > > >                       default
> > > = 100MiB
> > >
> > > > objecter_inflight_ops                                      24576
> > > >                       default
> > >
> > > not sure where you got this from, but the default is still 1024 even
> > > in master: https://github.com/ceph/ceph/blob/4774808cb2923f65f6919fe8be5f98917075cdd7/src/common/options.cc#L2288
> >
> > Looks like it is overridden in
> > https://github.com/ceph/ceph/blob/4774808cb2923f65f6919fe8be5f98917075cdd7/src/rgw/rgw_main.cc#L187
>
> you are right, this is new in Nautilus. Last time I had to play around
> with these settings was indeed on a Mimic deployment.
>
> > I'm just not
> > understanding how your suggestions would help, the problem doesn't
> > seem to be on the RADOS side (which it appears your tweaks target),
> > but on the HTTP side as an HTTP health check takes a long time to come
> > back when a big transfer is going on.
>
> I was guessing a bottleneck on the RADOS side because you mentioned
> that you tried both civetweb and beast, somewhat unlikely to run into
> the exact same problem with both

Looping in ceph-dev in case they have some insights into the inner
workings that may be helpful.

>From what I understand civitweb was not async and beast is, but if
beast is not coded exactly right, then it could behave similarly as
civitweb.

It seems that with beast incoming requests are being assigned to BEAST
threads and possibly it is doing as sync call to rados therefore
blocking requests behind it until the RADOS call is completed. I tried
looking through the code, but I'm not familiar with async in C++. I
could see two options that may resolve this. First, have a seperate
thread pool for accessing RADOS objects with a queue that BEAST
dispatches to and callback the completion at the end. The second
option is creating async RADOS calls so that it can yield the event
loop to another RADOS task. I couldn't tell if either one of these are
being done, but that should help small IO not get stuck behind large
IO.

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx