RE: radosgw crash within libfcgi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----------------------------------------
> Date: Wed, 24 Jun 2015 17:04:05 -0400
> From: yehuda@xxxxxxxxxx
> To: yguang11@xxxxxxxxxxx
> CC: ceph-devel@xxxxxxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx
> Subject: Re: radosgw crash within libfcgi
>
>
>
> ----- Original Message -----
>> From: "GuangYang" <yguang11@xxxxxxxxxxx>
>> To: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx>
>> Cc: ceph-devel@xxxxxxxxxxxxxxx, ceph-users@xxxxxxxxxxxxxx
>> Sent: Wednesday, June 24, 2015 1:53:20 PM
>> Subject: RE: radosgw crash within libfcgi
>>
>> Thanks Yehuda for the response.
>>
>> We already patched libfcgi to use poll instead of select to overcome the
>> limitation.
>>
>> Thanks,
>> Guang
>>
>>
>> ----------------------------------------
>>> Date: Wed, 24 Jun 2015 14:40:25 -0400
>>> From: yehuda@xxxxxxxxxx
>>> To: yguang11@xxxxxxxxxxx
>>> CC: ceph-devel@xxxxxxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx
>>> Subject: Re: radosgw crash within libfcgi
>>>
>>>
>>>
>>> ----- Original Message -----
>>>> From: "GuangYang" <yguang11@xxxxxxxxxxx>
>>>> To: ceph-devel@xxxxxxxxxxxxxxx, ceph-users@xxxxxxxxxxxxxx,
>>>> yehuda@xxxxxxxxxx
>>>> Sent: Wednesday, June 24, 2015 10:09:58 AM
>>>> Subject: radosgw crash within libfcgi
>>>>
>>>> Hello Cephers,
>>>> Recently we have several radosgw daemon crashes with the same following
>>>> kernel log:
>>>>
>>>> Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip
>>>> 00007ffa069996f2 sp 00007ff55c432710 error 6 in
>
> error 6 is sigabrt, right? With invalid pointer I'd expect to get segfault. Is the pointer actually invalid?
With (ip - {address_load_the_sharded_library}) to get the instruction which caused this crash, the objdump shows the crash happened at instruction 46f2 (see below), which was to assign '-1' to the CGX_Request::ipcFd to -1, but I don't quite understand how/why it could crash there.

0000000000004690 <FCGX_Free>:
    4690:       48 89 5c 24 f0          mov    %rbx,-0x10(%rsp)
    4695:       48 89 6c 24 f8          mov    %rbp,-0x8(%rsp)
    469a:       48 83 ec 18             sub    $0x18,%rsp
    469e:       48 85 ff                test   %rdi,%rdi
    46a1:       48 89 fb                mov    %rdi,%rbx
    46a4:       89 f5                   mov    %esi,%ebp
    46a6:       74 28                   je     46d0 <FCGX_Free+0x40>
    46a8:       48 8d 7f 08             lea    0x8(%rdi),%rdi
    46ac:       e8 67 e3 ff ff          callq  2a18 <FCGX_FreeStream@plt>
    46b1:       48 8d 7b 10             lea    0x10(%rbx),%rdi
    46b5:       e8 5e e3 ff ff          callq  2a18 <FCGX_FreeStream@plt>
    46ba:       48 8d 7b 18             lea    0x18(%rbx),%rdi
    46be:       e8 55 e3 ff ff          callq  2a18 <FCGX_FreeStream@plt>
    46c3:       48 8d 7b 28             lea    0x28(%rbx),%rdi
    46c7:       e8 d4 f4 ff ff          callq  3ba0 <FCGX_PutS+0x40>
    46cc:       85 ed                   test   %ebp,%ebp
    46ce:       75 10                   jne    46e0 <FCGX_Free+0x50>
    46d0:       48 8b 5c 24 08          mov    0x8(%rsp),%rbx
    46d5:       48 8b 6c 24 10          mov    0x10(%rsp),%rbp
    46da:       48 83 c4 18             add    $0x18,%rsp
    46de:       c3                      retq   
    46df:       90                      nop
    46e0:       31 f6                   xor    %esi,%esi
    46e2:       83 7b 4c 00             cmpl   $0x0,0x4c(%rbx)
    46e6:       8b 7b 30                mov    0x30(%rbx),%edi
    46e9:       40 0f 94 c6             sete   %sil
    46ed:       e8 86 e6 ff ff          callq  2d78 <OS_IpcClose@plt>
    46f2:       c7 43 30 ff ff ff ff    movl   $0xffffffff,0x30(%rbx)
>
> Yehuda
>
>
>>>> libfcgi.so.0.0.0[7ffa06995000+a000] in libfcgi.so.0.0.0[7ffa06995000+a000]
>>>>
>>>> Looking at the assembly, it seems crashing at this point -
>>>> http://github.com/sknown/fcgi/blob/master/libfcgi/fcgiapp.c#L2035, which
>>>> confused me. I tried to see if there is any other reference holding the
>>>> FCGX_Request which release the handle without any luck.
>>>>
>>>> There are also other observations:
>>>> 1> Several radosgw daemon across different hosts crashed around the same
>>>> time.
>>>> 2> Apache's error log has some fcgi error complaining ##idle timeout##
>>>> during the time.
>>>>
>>>> Does anyone experience similar issue?
>>>>
>>>
>>> In the past we've had issues with libfcgi that were related to the number
>>> of open fds on the process (> 1024). The issue was a buggy libfcgi that
>>> was using select() instead of poll(), so this might be the issue you're
>>> noticing.
>>>
>>> Yehuda
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> N嫥叉靣笡y氊b瞂千v豝�藓{.n�壏渮榏z鳐妠ay�蕠跈�jf"穐殝鄗�畐ア�⒎:+v墾妛鑚豰稛�珣赙zZ+凒殠娸"濟!秈
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
 		 	   		  ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux