Re: [RadosGW] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: read failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Yehuda, 

I did hit the apache concurrent connections that limit to 150 by default in apache at beginning. Several parameters were been bumped up. After that the concurrent connections of apache looks sufficient for the benchmark run or those failed requests will not been sent to FastCGI sever from my point of view.   As for FDs, do you mean the openfiles or fs-files-max ? The openfiles count  of Apache and RadosGW  were tweaked to 65536 already. The RadosGW hit that before. 

I'll give a try with the newer libfcgi. 

How many concurrent connections that RadosGW+apache ever had in your experience? Is that possible to hit the CPU bound ? 

Cheers / Hugo 


2013/12/26 Yehuda Sadeh <yehuda@xxxxxxxxxxx>
On Thu, Dec 26, 2013 at 3:00 AM, Kuo Hugo <tonytkdk@xxxxxxxxx> wrote:
> Hi all,
>
>
> I think the FastCGI module is the latest one on my server.
>
> root@p01:/var/log# dpkg -l | grep cgi
>
> ii  libapache2-mod-fastcgi            2.4.7~0910052141-2~bpo70+1.ceph
> Apache 2 FastCGI module for long-running CGI scripts
> ii  libfcgi0ldbl                      2.4.0-8.1
> Shared library of FastCGI
> ii  python-scgi                       1.13-1ubuntu1
> Server-side implementation of the SCGI protocol
>
>
> 1) It happens in higher concurrency( 990+) test.  The failed ratio about
> 10%. It never happened for concurrency under 960.
>
> Concurrency: 990
> Count:  8974 ( 1026 error;     0 retries:  0.00%)  Average requests per
> second: 669.3
>
>
>
> 2) Client tool get 500 internal sever Error from failed request. No relevant
> request log in radosgw.log. I think the External Fast CGI server did not get
> the request from apache. Does that mean the single Radosgw process has a
> limitation on 1000 concurrency connections.  No any interesting log in both
> syslog and kern.log.  The CPU loading approximately 50%.

No, it doesn't. It means that you have some issue in your environment.
Could be some kind of limit (max fds, apache concurrent connections,
socket backlog). There's a good chance you're hitting a problem with
the libfcgi module that used to use select() instead of poll() and was
breaking when fd number was greater than 1024. A newer version that
fixes it exists for ubuntu (try 2.4.0-8.1ubuntu3).

>
> ClientException: Object PUT failed:
> http://192.168.2.51:80/swift/v1/ssbench_000072/1KB_002058 500 Internal
> Server Error  [first 60 chars of response] <!DOCTYPE HTML PUBLIC
> "-//IETF//DTD HTML 2.0//EN">
> <html><he
>
> access:192.168.2.40 - - [26/Dec/2013:02:26:09 -0800] "PUT
> /swift/v1/ssbench_000072/1KB_002058 HTTP/1.1" 500 745 "-" "-"
> err:[Thu Dec 26 02:26:09 2013] [warn] FastCGI: 192.168.2.40 PUT
> http://192.168.2.51/swift/v1/ssbench_000072/1KB_002058 auth
> err:[Thu Dec 26 02:26:09 2013] [error] [client 192.168.2.40] (104)Connection
> reset by peer: FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: read
> failed
> err:[Thu Dec 26 02:26:09 2013] [error] [client 192.168.2.40] FastCGI:
> incomplete headers (0 bytes) received from server "/var/www/s3gw.fcgi"
>
>
>
>
> 3) No any wait in OSD or RGW's perf dump
>
> 4) Am I in the wrong Fastcgi module?

Don't think so, otherwise all PUTs would have failed.

Yehuda

>
> Good news is that the RadosGW can handle 500+ concurrency now. But I believe
> it can get better than 900+. The CPU loading is still low tho.
>
> Appreciate ~
>
>
> 2013/12/26 Yehuda Sadeh <yehuda@xxxxxxxxxxx>
>>
>> On Wed, Dec 25, 2013 at 9:12 AM, Kuo Hugo <tonytkdk@xxxxxxxxx> wrote:
>> > Hi folks,
>> >
>> > I'm in progress to tune the performance of RadosGW on my server. After
>> > some
>> > kindly helps from you guys. I figure out several problems for optimizing
>> > the
>> > RadosGW to handle higher concurrency requests from users.
>> >
>> > Apache optimization #
>> > radosgw open file #
>> > rgw thread pools #
>> > rgw_ops throttle #
>> > objecter_inflight_op_bytes
>> > objecter_inflight_ops
>> > etc....
>> >
>> > It's a powerful sever with 32 CPU threads + 62GB Ram. But I'm encounter
>> > a
>> > problem that there's no any clue from admin sockets.
>> >
>> > What's the meaning of the following FastCGI error in Apache's error.log
>> > ? It
>> > happened on both PUT and DELETE request.
>> > No any op wait in OSD or RadosGW. How to improve it by any chance ?
>> >
>> > I'm not sure the connection reset was raised by apache or FastCGI now.
>> >
>> >  [warn] FastCGI: 192.168.2.40 PUT
>> > http://192.168.2.51/swift/v1/ssbench_000045/1KB_025787 auth
>> >  [error] [client 192.168.2.40]
>> >  [error] [client 192.168.2.40] (104)Connection reset by peer: FastCGI:
>> > comm
>> > with server "/var/www/s3gw.fcgi" aborted: read failed
>> >  [error] [client 192.168.2.40] FastCGI: incomplete headers (0 bytes)
>> > received from server "/var/www/s3gw.fcgi"
>> >  [warn] FastCGI: 192.168.2.40 PUT
>> > http://192.168.2.51/swift/v1/ssbench_000040/1KB_025788 auth
>> >  [warn] FastCGI: 192.168.2.40 PUT
>> > http://192.168.2.51/swift/v1/ssbench_000021/1KB_025685 auth
>> >  [warn] FastCGI: 192.168.2.40 PUT
>> > http://192.168.2.51/swift/v1/ssbench_000047/1KB_025790 auth
>> >  [error] [client 192.168.2.40] (104)Connection reset by peer: FastCGI:
>> > comm
>> > with server "/var/www/s3gw.fcgi" aborted: read failed
>> >  [error] [client 192.168.2.40] FastCGI: incomplete headers (0 bytes)
>> > received from server "/var/www/s3gw.fcgi"
>> >
>> >  [warn] FastCGI: 192.168.2.40 DELETE
>> > http://192.168.2.51/swift/v1/ssbench_000006/1KB_012286 auth
>> >  [error] [client 192.168.2.40] (104)Connection reset by peer: FastCGI:
>> > comm
>> > with server "/var/www/s3gw.fcgi" aborted: read failed
>> >  [error] [client 192.168.2.40] FastCGI: incomplete headers (0 bytes)
>> > received from server "/var/www/s3gw.fcgi"
>> >  [warn] FastCGI: 192.168.2.40 DELETE
>> > http://192.168.2.51/swift/v1/ssbench_000061/1KB_012168 auth
>> >
>> >  [error] [client 192.168.2.40] (104)Connection reset by peer: FastCGI:
>> > comm
>> > with server "/var/www/s3gw.fcgi" aborted: read failed
>> >
>> >
>> >
>>
>>
>> Can you correlate these with the apache access log and with the
>> radosgw log? (e.g., do you get 500 responses?). It could happen if
>> you're using the wrong fastcgi module, or if the requests are too slow
>> to respond and apache is timing out.
>>
>> Yehuda
>
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux