Re: Radosgw scaling recommendation?

Ben Hines <bhines@xxxxxxxxx> · Thu, 9 Feb 2017 13:48:31 -0800

I'm curious how does the num_threads option to civetweb relate to the 'rgw thread pool size'?  Should i make them equal?
ie:

rgw frontends = civetweb enable_keep_alive=yes port=80 num_threads=125 error_log_file=/var/log/ceph/civetweb.error.log access_log_file=/var/log/ceph/civetweb.access.log

-Ben

On Thu, Feb 9, 2017 at 12:30 PM, Wido den Hollander <wido@xxxxxxxx> wrote:

> Op 9 februari 2017 om 19:34 schreef Mark Nelson <mnelson@xxxxxxxxxx>:

>

>

> I'm not really an RGW expert, but I'd suggest increasing the

> "rgw_thread_pool_size" option to something much higher than the default

> 100 threads if you haven't already.  RGW requires at least 1 thread per

> client connection, so with many concurrent connections some of them

> might end up timing out.  You can scale the number of threads and even

> the number of RGW instances on a single server, but at some point you'll

> run out of threads at the OS level.  Probably before that actually

> happens though, you'll want to think about multiple RGW gateway nodes

> behind a load balancer.  Afaik that's how the big sites do it.

>

In addition, have you tried to use more RADOS handles?

rgw_num_rados_handles = 8

That with more RGW threads as Mark mentioned.

Wido

> I believe some folks are considering trying to migrate rgw to a

> threadpool/event processing model but it sounds like it would be quite a

> bit of work.

>

> Mark

>

> On 02/09/2017 12:25 PM, Benjeman Meekhof wrote:

> > Hi all,

> >

> > We're doing some stress testing with clients hitting our rados gw

> > nodes with simultaneous connections.  When the number of client

> > connections exceeds about 5400 we start seeing 403 forbidden errors

> > and log messages like the following:

> >

> > 2017-02-09 08:53:16.915536 7f8c667bc700 0 NOTICE: request time skew

> > too big now=2017-02-09 08:53:16.000000 req_time=2017-02-09

> > 08:37:18.000000

> >

> > This is version 10.2.5 using embedded civetweb.  There's just one

> > instance per node, and they all start generating 403 errors and the

> > above log messages when enough clients start hitting them.  The

> > hardware is not being taxed at all, negligible load and network

> > throughput.   OSD don't show any appreciable increase in CPU load or

> > io wait on journal/data devices.  Unless I'm missing something it

> > looks like the RGW is just not scaling to fill out the hardware it is

> > on.

> >

> > Does anyone have advice on scaling RGW to fully utilize a host?

> >

> > thanks,

> > Ben

> > _______________________________________________

> > ceph-users mailing list

> > ceph-users@xxxxxxxxxxxxxx

> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> >

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com