I'm curious how does the num_threads option to civetweb relate to the 'rgw thread pool size'? Should i make them equal?
ie:
rgw frontends = civetweb enable_keep_alive=yes port=80 num_threads=125 error_log_file=/var/log/ceph/civetweb.error.log access_log_file=/var/log/ceph/civetweb.access.log
-Ben
On Thu, Feb 9, 2017 at 12:30 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
> Op 9 februari 2017 om 19:34 schreef Mark Nelson <mnelson@xxxxxxxxxx>:
>
>
> I'm not really an RGW expert, but I'd suggest increasing the
> "rgw_thread_pool_size" option to something much higher than the default
> 100 threads if you haven't already. RGW requires at least 1 thread per
> client connection, so with many concurrent connections some of them
> might end up timing out. You can scale the number of threads and even
> the number of RGW instances on a single server, but at some point you'll
> run out of threads at the OS level. Probably before that actually
> happens though, you'll want to think about multiple RGW gateway nodes
> behind a load balancer. Afaik that's how the big sites do it.
>
In addition, have you tried to use more RADOS handles?
rgw_num_rados_handles = 8
That with more RGW threads as Mark mentioned.
Wido
> I believe some folks are considering trying to migrate rgw to a
> threadpool/event processing model but it sounds like it would be quite a
> bit of work.
>
> Mark
>
> On 02/09/2017 12:25 PM, Benjeman Meekhof wrote:
> > Hi all,
> >
> > We're doing some stress testing with clients hitting our rados gw
> > nodes with simultaneous connections. When the number of client
> > connections exceeds about 5400 we start seeing 403 forbidden errors
> > and log messages like the following:
> >
> > 2017-02-09 08:53:16.915536 7f8c667bc700 0 NOTICE: request time skew
> > too big now=2017-02-09 08:53:16.000000 req_time=2017-02-09
> > 08:37:18.000000
> >
> > This is version 10.2.5 using embedded civetweb. There's just one
> > instance per node, and they all start generating 403 errors and the
> > above log messages when enough clients start hitting them. The
> > hardware is not being taxed at all, negligible load and network
> > throughput. OSD don't show any appreciable increase in CPU load or
> > io wait on journal/data devices. Unless I'm missing something it
> > looks like the RGW is just not scaling to fill out the hardware it is
> > on.
> >
> > Does anyone have advice on scaling RGW to fully utilize a host?
> >
> > thanks,
> > Ben
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com