Re: Radosgw scaling recommendation?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm curious how does the num_threads option to civetweb relate to the 'rgw thread pool size'?  Should i make them equal?

ie:

rgw frontends = civetweb enable_keep_alive=yes port=80 num_threads=125 error_log_file=/var/log/ceph/civetweb.error.log access_log_file=/var/log/ceph/civetweb.access.log


-Ben

On Thu, Feb 9, 2017 at 12:30 PM, Wido den Hollander <wido@xxxxxxxx> wrote:

> Op 9 februari 2017 om 19:34 schreef Mark Nelson <mnelson@xxxxxxxxxx>:
>
>
> I'm not really an RGW expert, but I'd suggest increasing the
> "rgw_thread_pool_size" option to something much higher than the default
> 100 threads if you haven't already.  RGW requires at least 1 thread per
> client connection, so with many concurrent connections some of them
> might end up timing out.  You can scale the number of threads and even
> the number of RGW instances on a single server, but at some point you'll
> run out of threads at the OS level.  Probably before that actually
> happens though, you'll want to think about multiple RGW gateway nodes
> behind a load balancer.  Afaik that's how the big sites do it.
>

In addition, have you tried to use more RADOS handles?

rgw_num_rados_handles = 8

That with more RGW threads as Mark mentioned.

Wido

> I believe some folks are considering trying to migrate rgw to a
> threadpool/event processing model but it sounds like it would be quite a
> bit of work.
>
> Mark
>
> On 02/09/2017 12:25 PM, Benjeman Meekhof wrote:
> > Hi all,
> >
> > We're doing some stress testing with clients hitting our rados gw
> > nodes with simultaneous connections.  When the number of client
> > connections exceeds about 5400 we start seeing 403 forbidden errors
> > and log messages like the following:
> >
> > 2017-02-09 08:53:16.915536 7f8c667bc700 0 NOTICE: request time skew
> > too big now=2017-02-09 08:53:16.000000 req_time=2017-02-09
> > 08:37:18.000000
> >
> > This is version 10.2.5 using embedded civetweb.  There's just one
> > instance per node, and they all start generating 403 errors and the
> > above log messages when enough clients start hitting them.  The
> > hardware is not being taxed at all, negligible load and network
> > throughput.   OSD don't show any appreciable increase in CPU load or
> > io wait on journal/data devices.  Unless I'm missing something it
> > looks like the RGW is just not scaling to fill out the hardware it is
> > on.
> >
> > Does anyone have advice on scaling RGW to fully utilize a host?
> >
> > thanks,
> > Ben
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux