Re: RGW blocking on large objects

Paul Emmerich <paul.emmerich@xxxxxxxx> · Thu, 17 Oct 2019 11:50:37 +0200

On Thu, Oct 17, 2019 at 12:17 AM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>
> On Wed, Oct 16, 2019 at 2:50 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
> >
> > On Wed, Oct 16, 2019 at 11:23 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Oct 15, 2019 at 8:05 AM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Oct 14, 2019 at 2:58 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
> > > > >
> > > > > Could the 4 GB GET limit saturate the connection from rgw to Ceph?
> > > > > Simple to test: just rate-limit the health check GET
> > > >
> > > > I don't think so, we have dual 25Gbp in a LAG, so Ceph to RGW has
> > > > multiple paths, but we aren't balancing on port yet, so RGW to HAProxy
> > > > is probably limited to one link.
> > > >
> > > > > Did you increase "objecter inflight ops" and "objecter inflight op bytes"?
> > > > > You absolutely should adjust these settings for large RGW setups,
> > > > > defaults of 1024 and 100 MB are way too low for many RGW setups, we
> > > > > default to 8192 and 800MB
> > >
> > > On Nautilus the defaults already seem to be:
> > > objecter_inflight_op_bytes                                 104857600
> > >                       default
> > = 100MiB
> >
> > > objecter_inflight_ops                                      24576
> > >                       default
> >
> > not sure where you got this from, but the default is still 1024 even
> > in master: https://github.com/ceph/ceph/blob/4774808cb2923f65f6919fe8be5f98917075cdd7/src/common/options.cc#L2288
>
> Looks like it is overridden in
> https://github.com/ceph/ceph/blob/4774808cb2923f65f6919fe8be5f98917075cdd7/src/rgw/rgw_main.cc#L187

you are right, this is new in Nautilus. Last time I had to play around
with these settings was indeed on a Mimic deployment.

> I'm just not
> understanding how your suggestions would help, the problem doesn't
> seem to be on the RADOS side (which it appears your tweaks target),
> but on the HTTP side as an HTTP health check takes a long time to come
> back when a big transfer is going on.

I was guessing a bottleneck on the RADOS side because you mentioned
that you tried both civetweb and beast, somewhat unlikely to run into
the exact same problem with both

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx