Re: [RadosGW] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: read failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 26, 2013 at 7:58 AM, Kuo Hugo <tonytkdk@xxxxxxxxx> wrote:
> Hi Yehuda,
>
> I did hit the apache concurrent connections that limit to 150 by default in
> apache at beginning. Several parameters were been bumped up. After that the
> concurrent connections of apache looks sufficient for the benchmark run or
> those failed requests will not been sent to FastCGI sever from my point of
> view.   As for FDs, do you mean the openfiles or fs-files-max ? The
> openfiles count  of Apache and RadosGW  were tweaked to 65536 already. The
> RadosGW hit that before.
>
> I'll give a try with the newer libfcgi.
>
> How many concurrent connections that RadosGW+apache ever had in your
> experience? Is that possible to hit the CPU bound ?

More than 1000, don't really remember. I do remember that it required
replacing libfcgi. I'm pretty sure you can hit cpu limit with enough
concurrency.

Yehuda

>
> Cheers / Hugo
>
>
> 2013/12/26 Yehuda Sadeh <yehuda@xxxxxxxxxxx>
>>
>> On Thu, Dec 26, 2013 at 3:00 AM, Kuo Hugo <tonytkdk@xxxxxxxxx> wrote:
>> > Hi all,
>> >
>> >
>> > I think the FastCGI module is the latest one on my server.
>> >
>> > root@p01:/var/log# dpkg -l | grep cgi
>> >
>> > ii  libapache2-mod-fastcgi            2.4.7~0910052141-2~bpo70+1.ceph
>> > Apache 2 FastCGI module for long-running CGI scripts
>> > ii  libfcgi0ldbl                      2.4.0-8.1
>> > Shared library of FastCGI
>> > ii  python-scgi                       1.13-1ubuntu1
>> > Server-side implementation of the SCGI protocol
>> >
>> >
>> > 1) It happens in higher concurrency( 990+) test.  The failed ratio about
>> > 10%. It never happened for concurrency under 960.
>> >
>> > Concurrency: 990
>> > Count:  8974 ( 1026 error;     0 retries:  0.00%)  Average requests per
>> > second: 669.3
>> >
>> >
>> >
>> > 2) Client tool get 500 internal sever Error from failed request. No
>> > relevant
>> > request log in radosgw.log. I think the External Fast CGI server did not
>> > get
>> > the request from apache. Does that mean the single Radosgw process has a
>> > limitation on 1000 concurrency connections.  No any interesting log in
>> > both
>> > syslog and kern.log.  The CPU loading approximately 50%.
>>
>> No, it doesn't. It means that you have some issue in your environment.
>> Could be some kind of limit (max fds, apache concurrent connections,
>> socket backlog). There's a good chance you're hitting a problem with
>> the libfcgi module that used to use select() instead of poll() and was
>> breaking when fd number was greater than 1024. A newer version that
>> fixes it exists for ubuntu (try 2.4.0-8.1ubuntu3).
>>
>> >
>> > ClientException: Object PUT failed:
>> > http://192.168.2.51:80/swift/v1/ssbench_000072/1KB_002058 500 Internal
>> > Server Error  [first 60 chars of response] <!DOCTYPE HTML PUBLIC
>> > "-//IETF//DTD HTML 2.0//EN">
>> > <html><he
>> >
>> > access:192.168.2.40 - - [26/Dec/2013:02:26:09 -0800] "PUT
>> > /swift/v1/ssbench_000072/1KB_002058 HTTP/1.1" 500 745 "-" "-"
>> > err:[Thu Dec 26 02:26:09 2013] [warn] FastCGI: 192.168.2.40 PUT
>> > http://192.168.2.51/swift/v1/ssbench_000072/1KB_002058 auth
>> > err:[Thu Dec 26 02:26:09 2013] [error] [client 192.168.2.40]
>> > (104)Connection
>> > reset by peer: FastCGI: comm with server "/var/www/s3gw.fcgi" aborted:
>> > read
>> > failed
>> > err:[Thu Dec 26 02:26:09 2013] [error] [client 192.168.2.40] FastCGI:
>> > incomplete headers (0 bytes) received from server "/var/www/s3gw.fcgi"
>> >
>> >
>> >
>> >
>> > 3) No any wait in OSD or RGW's perf dump
>> >
>> > 4) Am I in the wrong Fastcgi module?
>>
>> Don't think so, otherwise all PUTs would have failed.
>>
>> Yehuda
>>
>> >
>> > Good news is that the RadosGW can handle 500+ concurrency now. But I
>> > believe
>> > it can get better than 900+. The CPU loading is still low tho.
>> >
>> > Appreciate ~
>> >
>> >
>> > 2013/12/26 Yehuda Sadeh <yehuda@xxxxxxxxxxx>
>> >>
>> >> On Wed, Dec 25, 2013 at 9:12 AM, Kuo Hugo <tonytkdk@xxxxxxxxx> wrote:
>> >> > Hi folks,
>> >> >
>> >> > I'm in progress to tune the performance of RadosGW on my server.
>> >> > After
>> >> > some
>> >> > kindly helps from you guys. I figure out several problems for
>> >> > optimizing
>> >> > the
>> >> > RadosGW to handle higher concurrency requests from users.
>> >> >
>> >> > Apache optimization #
>> >> > radosgw open file #
>> >> > rgw thread pools #
>> >> > rgw_ops throttle #
>> >> > objecter_inflight_op_bytes
>> >> > objecter_inflight_ops
>> >> > etc....
>> >> >
>> >> > It's a powerful sever with 32 CPU threads + 62GB Ram. But I'm
>> >> > encounter
>> >> > a
>> >> > problem that there's no any clue from admin sockets.
>> >> >
>> >> > What's the meaning of the following FastCGI error in Apache's
>> >> > error.log
>> >> > ? It
>> >> > happened on both PUT and DELETE request.
>> >> > No any op wait in OSD or RadosGW. How to improve it by any chance ?
>> >> >
>> >> > I'm not sure the connection reset was raised by apache or FastCGI
>> >> > now.
>> >> >
>> >> >  [warn] FastCGI: 192.168.2.40 PUT
>> >> > http://192.168.2.51/swift/v1/ssbench_000045/1KB_025787 auth
>> >> >  [error] [client 192.168.2.40]
>> >> >  [error] [client 192.168.2.40] (104)Connection reset by peer:
>> >> > FastCGI:
>> >> > comm
>> >> > with server "/var/www/s3gw.fcgi" aborted: read failed
>> >> >  [error] [client 192.168.2.40] FastCGI: incomplete headers (0 bytes)
>> >> > received from server "/var/www/s3gw.fcgi"
>> >> >  [warn] FastCGI: 192.168.2.40 PUT
>> >> > http://192.168.2.51/swift/v1/ssbench_000040/1KB_025788 auth
>> >> >  [warn] FastCGI: 192.168.2.40 PUT
>> >> > http://192.168.2.51/swift/v1/ssbench_000021/1KB_025685 auth
>> >> >  [warn] FastCGI: 192.168.2.40 PUT
>> >> > http://192.168.2.51/swift/v1/ssbench_000047/1KB_025790 auth
>> >> >  [error] [client 192.168.2.40] (104)Connection reset by peer:
>> >> > FastCGI:
>> >> > comm
>> >> > with server "/var/www/s3gw.fcgi" aborted: read failed
>> >> >  [error] [client 192.168.2.40] FastCGI: incomplete headers (0 bytes)
>> >> > received from server "/var/www/s3gw.fcgi"
>> >> >
>> >> >  [warn] FastCGI: 192.168.2.40 DELETE
>> >> > http://192.168.2.51/swift/v1/ssbench_000006/1KB_012286 auth
>> >> >  [error] [client 192.168.2.40] (104)Connection reset by peer:
>> >> > FastCGI:
>> >> > comm
>> >> > with server "/var/www/s3gw.fcgi" aborted: read failed
>> >> >  [error] [client 192.168.2.40] FastCGI: incomplete headers (0 bytes)
>> >> > received from server "/var/www/s3gw.fcgi"
>> >> >  [warn] FastCGI: 192.168.2.40 DELETE
>> >> > http://192.168.2.51/swift/v1/ssbench_000061/1KB_012168 auth
>> >> >
>> >> >  [error] [client 192.168.2.40] (104)Connection reset by peer:
>> >> > FastCGI:
>> >> > comm
>> >> > with server "/var/www/s3gw.fcgi" aborted: read failed
>> >> >
>> >> >
>> >> >
>> >>
>> >>
>> >> Can you correlate these with the apache access log and with the
>> >> radosgw log? (e.g., do you get 500 responses?). It could happen if
>> >> you're using the wrong fastcgi module, or if the requests are too slow
>> >> to respond and apache is timing out.
>> >>
>> >> Yehuda
>> >
>> >
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux