---------------------------------------- > Date: Wed, 24 Jun 2015 17:04:05 -0400 > From: yehuda@xxxxxxxxxx > To: yguang11@xxxxxxxxxxx > CC: ceph-devel@xxxxxxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx > Subject: Re: radosgw crash within libfcgi > > > > ----- Original Message ----- >> From: "GuangYang" <yguang11@xxxxxxxxxxx> >> To: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx> >> Cc: ceph-devel@xxxxxxxxxxxxxxx, ceph-users@xxxxxxxxxxxxxx >> Sent: Wednesday, June 24, 2015 1:53:20 PM >> Subject: RE: radosgw crash within libfcgi >> >> Thanks Yehuda for the response. >> >> We already patched libfcgi to use poll instead of select to overcome the >> limitation. >> >> Thanks, >> Guang >> >> >> ---------------------------------------- >>> Date: Wed, 24 Jun 2015 14:40:25 -0400 >>> From: yehuda@xxxxxxxxxx >>> To: yguang11@xxxxxxxxxxx >>> CC: ceph-devel@xxxxxxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx >>> Subject: Re: radosgw crash within libfcgi >>> >>> >>> >>> ----- Original Message ----- >>>> From: "GuangYang" <yguang11@xxxxxxxxxxx> >>>> To: ceph-devel@xxxxxxxxxxxxxxx, ceph-users@xxxxxxxxxxxxxx, >>>> yehuda@xxxxxxxxxx >>>> Sent: Wednesday, June 24, 2015 10:09:58 AM >>>> Subject: radosgw crash within libfcgi >>>> >>>> Hello Cephers, >>>> Recently we have several radosgw daemon crashes with the same following >>>> kernel log: >>>> >>>> Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip >>>> 00007ffa069996f2 sp 00007ff55c432710 error 6 in > > error 6 is sigabrt, right? With invalid pointer I'd expect to get segfault. Is the pointer actually invalid? With (ip - {address_load_the_sharded_library}) to get the instruction which caused this crash, the objdump shows the crash happened at instruction 46f2 (see below), which was to assign '-1' to the CGX_Request::ipcFd to -1, but I don't quite understand how/why it could crash there. 0000000000004690 <FCGX_Free>: 4690: 48 89 5c 24 f0 mov %rbx,-0x10(%rsp) 4695: 48 89 6c 24 f8 mov %rbp,-0x8(%rsp) 469a: 48 83 ec 18 sub $0x18,%rsp 469e: 48 85 ff test %rdi,%rdi 46a1: 48 89 fb mov %rdi,%rbx 46a4: 89 f5 mov %esi,%ebp 46a6: 74 28 je 46d0 <FCGX_Free+0x40> 46a8: 48 8d 7f 08 lea 0x8(%rdi),%rdi 46ac: e8 67 e3 ff ff callq 2a18 <FCGX_FreeStream@plt> 46b1: 48 8d 7b 10 lea 0x10(%rbx),%rdi 46b5: e8 5e e3 ff ff callq 2a18 <FCGX_FreeStream@plt> 46ba: 48 8d 7b 18 lea 0x18(%rbx),%rdi 46be: e8 55 e3 ff ff callq 2a18 <FCGX_FreeStream@plt> 46c3: 48 8d 7b 28 lea 0x28(%rbx),%rdi 46c7: e8 d4 f4 ff ff callq 3ba0 <FCGX_PutS+0x40> 46cc: 85 ed test %ebp,%ebp 46ce: 75 10 jne 46e0 <FCGX_Free+0x50> 46d0: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx 46d5: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp 46da: 48 83 c4 18 add $0x18,%rsp 46de: c3 retq 46df: 90 nop 46e0: 31 f6 xor %esi,%esi 46e2: 83 7b 4c 00 cmpl $0x0,0x4c(%rbx) 46e6: 8b 7b 30 mov 0x30(%rbx),%edi 46e9: 40 0f 94 c6 sete %sil 46ed: e8 86 e6 ff ff callq 2d78 <OS_IpcClose@plt> 46f2: c7 43 30 ff ff ff ff movl $0xffffffff,0x30(%rbx) > > Yehuda > > >>>> libfcgi.so.0.0.0[7ffa06995000+a000] in libfcgi.so.0.0.0[7ffa06995000+a000] >>>> >>>> Looking at the assembly, it seems crashing at this point - >>>> http://github.com/sknown/fcgi/blob/master/libfcgi/fcgiapp.c#L2035, which >>>> confused me. I tried to see if there is any other reference holding the >>>> FCGX_Request which release the handle without any luck. >>>> >>>> There are also other observations: >>>> 1> Several radosgw daemon across different hosts crashed around the same >>>> time. >>>> 2> Apache's error log has some fcgi error complaining ##idle timeout## >>>> during the time. >>>> >>>> Does anyone experience similar issue? >>>> >>> >>> In the past we've had issues with libfcgi that were related to the number >>> of open fds on the process (> 1024). The issue was a buggy libfcgi that >>> was using select() instead of poll(), so this might be the issue you're >>> noticing. >>> >>> Yehuda >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> N嫥叉靣笡y氊b瞂千v豝�藓{.n�壏渮榏z鳐妠ay�蕠跈�jf"穐殝鄗�畐ア�⒎:+v墾妛鑚豰稛�珣赙zZ+凒殠娸"濟!秈 > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com