Re: RGW blocking on large objects

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Wed, 16 Oct 2019 15:17:10 -0700

On Wed, Oct 16, 2019 at 2:50 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
>
> On Wed, Oct 16, 2019 at 11:23 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, Oct 15, 2019 at 8:05 AM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Oct 14, 2019 at 2:58 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
> > > >
> > > > Could the 4 GB GET limit saturate the connection from rgw to Ceph?
> > > > Simple to test: just rate-limit the health check GET
> > >
> > > I don't think so, we have dual 25Gbp in a LAG, so Ceph to RGW has
> > > multiple paths, but we aren't balancing on port yet, so RGW to HAProxy
> > > is probably limited to one link.
> > >
> > > > Did you increase "objecter inflight ops" and "objecter inflight op bytes"?
> > > > You absolutely should adjust these settings for large RGW setups,
> > > > defaults of 1024 and 100 MB are way too low for many RGW setups, we
> > > > default to 8192 and 800MB
> >
> > On Nautilus the defaults already seem to be:
> > objecter_inflight_op_bytes                                 104857600
> >                       default
> = 100MiB
>
> > objecter_inflight_ops                                      24576
> >                       default
>
> not sure where you got this from, but the default is still 1024 even
> in master: https://github.com/ceph/ceph/blob/4774808cb2923f65f6919fe8be5f98917075cdd7/src/common/options.cc#L2288

Looks like it is overridden in
https://github.com/ceph/ceph/blob/4774808cb2923f65f6919fe8be5f98917075cdd7/src/rgw/rgw_main.cc#L187

I got the value through `ceph config show-with-defaults rgw.<name>.rgw0`

It's kind of reminiscent of buffer bloat where one big transfer just
blocks all the other ones. It just feels like something blocking in
the code like an awiat not passing control back properly. I'm just not
understanding how your suggestions would help, the problem doesn't
seem to be on the RADOS side (which it appears your tweaks target),
but on the HTTP side as an HTTP health check takes a long time to come
back when a big transfer is going on.

Granted, we are really trying to get latency down in our environment
because we do fast failover. If RGW doesn't have the object, then
abort and check AWS so Time to First Byte is critical. It lets us know
if the object is there or not quickly and any request being blocked by
a long transfer will execute the fail path in our code.

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx