On Thu, Dec 26, 2013 at 3:00 AM, Kuo Hugo <tonytkdk@xxxxxxxxx> wrote: > Hi all, > > > I think the FastCGI module is the latest one on my server. > > root@p01:/var/log# dpkg -l | grep cgi > > ii libapache2-mod-fastcgi 2.4.7~0910052141-2~bpo70+1.ceph > Apache 2 FastCGI module for long-running CGI scripts > ii libfcgi0ldbl 2.4.0-8.1 > Shared library of FastCGI > ii python-scgi 1.13-1ubuntu1 > Server-side implementation of the SCGI protocol > > > 1) It happens in higher concurrency( 990+) test. The failed ratio about > 10%. It never happened for concurrency under 960. > > Concurrency: 990 > Count: 8974 ( 1026 error; 0 retries: 0.00%) Average requests per > second: 669.3 > > > > 2) Client tool get 500 internal sever Error from failed request. No relevant > request log in radosgw.log. I think the External Fast CGI server did not get > the request from apache. Does that mean the single Radosgw process has a > limitation on 1000 concurrency connections. No any interesting log in both > syslog and kern.log. The CPU loading approximately 50%. No, it doesn't. It means that you have some issue in your environment. Could be some kind of limit (max fds, apache concurrent connections, socket backlog). There's a good chance you're hitting a problem with the libfcgi module that used to use select() instead of poll() and was breaking when fd number was greater than 1024. A newer version that fixes it exists for ubuntu (try 2.4.0-8.1ubuntu3). > > ClientException: Object PUT failed: > http://192.168.2.51:80/swift/v1/ssbench_000072/1KB_002058 500 Internal > Server Error [first 60 chars of response] <!DOCTYPE HTML PUBLIC > "-//IETF//DTD HTML 2.0//EN"> > <html><he > > access:192.168.2.40 - - [26/Dec/2013:02:26:09 -0800] "PUT > /swift/v1/ssbench_000072/1KB_002058 HTTP/1.1" 500 745 "-" "-" > err:[Thu Dec 26 02:26:09 2013] [warn] FastCGI: 192.168.2.40 PUT > http://192.168.2.51/swift/v1/ssbench_000072/1KB_002058 auth > err:[Thu Dec 26 02:26:09 2013] [error] [client 192.168.2.40] (104)Connection > reset by peer: FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: read > failed > err:[Thu Dec 26 02:26:09 2013] [error] [client 192.168.2.40] FastCGI: > incomplete headers (0 bytes) received from server "/var/www/s3gw.fcgi" > > > > > 3) No any wait in OSD or RGW's perf dump > > 4) Am I in the wrong Fastcgi module? Don't think so, otherwise all PUTs would have failed. Yehuda > > Good news is that the RadosGW can handle 500+ concurrency now. But I believe > it can get better than 900+. The CPU loading is still low tho. > > Appreciate ~ > > > 2013/12/26 Yehuda Sadeh <yehuda@xxxxxxxxxxx> >> >> On Wed, Dec 25, 2013 at 9:12 AM, Kuo Hugo <tonytkdk@xxxxxxxxx> wrote: >> > Hi folks, >> > >> > I'm in progress to tune the performance of RadosGW on my server. After >> > some >> > kindly helps from you guys. I figure out several problems for optimizing >> > the >> > RadosGW to handle higher concurrency requests from users. >> > >> > Apache optimization # >> > radosgw open file # >> > rgw thread pools # >> > rgw_ops throttle # >> > objecter_inflight_op_bytes >> > objecter_inflight_ops >> > etc.... >> > >> > It's a powerful sever with 32 CPU threads + 62GB Ram. But I'm encounter >> > a >> > problem that there's no any clue from admin sockets. >> > >> > What's the meaning of the following FastCGI error in Apache's error.log >> > ? It >> > happened on both PUT and DELETE request. >> > No any op wait in OSD or RadosGW. How to improve it by any chance ? >> > >> > I'm not sure the connection reset was raised by apache or FastCGI now. >> > >> > [warn] FastCGI: 192.168.2.40 PUT >> > http://192.168.2.51/swift/v1/ssbench_000045/1KB_025787 auth >> > [error] [client 192.168.2.40] >> > [error] [client 192.168.2.40] (104)Connection reset by peer: FastCGI: >> > comm >> > with server "/var/www/s3gw.fcgi" aborted: read failed >> > [error] [client 192.168.2.40] FastCGI: incomplete headers (0 bytes) >> > received from server "/var/www/s3gw.fcgi" >> > [warn] FastCGI: 192.168.2.40 PUT >> > http://192.168.2.51/swift/v1/ssbench_000040/1KB_025788 auth >> > [warn] FastCGI: 192.168.2.40 PUT >> > http://192.168.2.51/swift/v1/ssbench_000021/1KB_025685 auth >> > [warn] FastCGI: 192.168.2.40 PUT >> > http://192.168.2.51/swift/v1/ssbench_000047/1KB_025790 auth >> > [error] [client 192.168.2.40] (104)Connection reset by peer: FastCGI: >> > comm >> > with server "/var/www/s3gw.fcgi" aborted: read failed >> > [error] [client 192.168.2.40] FastCGI: incomplete headers (0 bytes) >> > received from server "/var/www/s3gw.fcgi" >> > >> > [warn] FastCGI: 192.168.2.40 DELETE >> > http://192.168.2.51/swift/v1/ssbench_000006/1KB_012286 auth >> > [error] [client 192.168.2.40] (104)Connection reset by peer: FastCGI: >> > comm >> > with server "/var/www/s3gw.fcgi" aborted: read failed >> > [error] [client 192.168.2.40] FastCGI: incomplete headers (0 bytes) >> > received from server "/var/www/s3gw.fcgi" >> > [warn] FastCGI: 192.168.2.40 DELETE >> > http://192.168.2.51/swift/v1/ssbench_000061/1KB_012168 auth >> > >> > [error] [client 192.168.2.40] (104)Connection reset by peer: FastCGI: >> > comm >> > with server "/var/www/s3gw.fcgi" aborted: read failed >> > >> > >> > >> >> >> Can you correlate these with the apache access log and with the >> radosgw log? (e.g., do you get 500 responses?). It could happen if >> you're using the wrong fastcgi module, or if the requests are too slow >> to respond and apache is timing out. >> >> Yehuda > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com