Thanks Yehuda for the response. We already patched libfcgi to use poll instead of select to overcome the limitation. Thanks, Guang ---------------------------------------- > Date: Wed, 24 Jun 2015 14:40:25 -0400 > From: yehuda@xxxxxxxxxx > To: yguang11@xxxxxxxxxxx > CC: ceph-devel@xxxxxxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx > Subject: Re: radosgw crash within libfcgi > > > > ----- Original Message ----- >> From: "GuangYang" <yguang11@xxxxxxxxxxx> >> To: ceph-devel@xxxxxxxxxxxxxxx, ceph-users@xxxxxxxxxxxxxx, yehuda@xxxxxxxxxx >> Sent: Wednesday, June 24, 2015 10:09:58 AM >> Subject: radosgw crash within libfcgi >> >> Hello Cephers, >> Recently we have several radosgw daemon crashes with the same following >> kernel log: >> >> Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip >> 00007ffa069996f2 sp 00007ff55c432710 error 6 in >> libfcgi.so.0.0.0[7ffa06995000+a000] in libfcgi.so.0.0.0[7ffa06995000+a000] >> >> Looking at the assembly, it seems crashing at this point - >> http://github.com/sknown/fcgi/blob/master/libfcgi/fcgiapp.c#L2035, which >> confused me. I tried to see if there is any other reference holding the >> FCGX_Request which release the handle without any luck. >> >> There are also other observations: >> 1> Several radosgw daemon across different hosts crashed around the same >> time. >> 2> Apache's error log has some fcgi error complaining ##idle timeout## >> during the time. >> >> Does anyone experience similar issue? >> > > In the past we've had issues with libfcgi that were related to the number of open fds on the process (> 1024). The issue was a buggy libfcgi that was using select() instead of poll(), so this might be the issue you're noticing. > > Yehuda > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html ?韬{.n?????%??檩??w?{.n????u朕?Ф?塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f