I'm not really an RGW expert, but I'd suggest increasing the
"rgw_thread_pool_size" option to something much higher than the default
100 threads if you haven't already. RGW requires at least 1 thread per
client connection, so with many concurrent connections some of them
might end up timing out. You can scale the number of threads and even
the number of RGW instances on a single server, but at some point you'll
run out of threads at the OS level. Probably before that actually
happens though, you'll want to think about multiple RGW gateway nodes
behind a load balancer. Afaik that's how the big sites do it.
I believe some folks are considering trying to migrate rgw to a
threadpool/event processing model but it sounds like it would be quite a
bit of work.
Mark
On 02/09/2017 12:25 PM, Benjeman Meekhof wrote:
Hi all,
We're doing some stress testing with clients hitting our rados gw
nodes with simultaneous connections. When the number of client
connections exceeds about 5400 we start seeing 403 forbidden errors
and log messages like the following:
2017-02-09 08:53:16.915536 7f8c667bc700 0 NOTICE: request time skew
too big now=2017-02-09 08:53:16.000000 req_time=2017-02-09
08:37:18.000000
This is version 10.2.5 using embedded civetweb. There's just one
instance per node, and they all start generating 403 errors and the
above log messages when enough clients start hitting them. The
hardware is not being taxed at all, negligible load and network
throughput. OSD don't show any appreciable increase in CPU load or
io wait on journal/data devices. Unless I'm missing something it
looks like the RGW is just not scaling to fill out the hardware it is
on.
Does anyone have advice on scaling RGW to fully utilize a host?
thanks,
Ben
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com