Re: radosgw crash within libfcgi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> From: "GuangYang" <yguang11@xxxxxxxxxxx>
> To: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx>
> Cc: ceph-devel@xxxxxxxxxxxxxxx, ceph-users@xxxxxxxxxxxxxx
> Sent: Wednesday, June 24, 2015 2:12:23 PM
> Subject: RE: radosgw crash within libfcgi
> 
> ----------------------------------------
> > Date: Wed, 24 Jun 2015 17:04:05 -0400
> > From: yehuda@xxxxxxxxxx
> > To: yguang11@xxxxxxxxxxx
> > CC: ceph-devel@xxxxxxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx
> > Subject: Re: radosgw crash within libfcgi
> >
> >
> >
> > ----- Original Message -----
> >> From: "GuangYang" <yguang11@xxxxxxxxxxx>
> >> To: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx>
> >> Cc: ceph-devel@xxxxxxxxxxxxxxx, ceph-users@xxxxxxxxxxxxxx
> >> Sent: Wednesday, June 24, 2015 1:53:20 PM
> >> Subject: RE: radosgw crash within libfcgi
> >>
> >> Thanks Yehuda for the response.
> >>
> >> We already patched libfcgi to use poll instead of select to overcome the
> >> limitation.
> >>
> >> Thanks,
> >> Guang
> >>
> >>
> >> ----------------------------------------
> >>> Date: Wed, 24 Jun 2015 14:40:25 -0400
> >>> From: yehuda@xxxxxxxxxx
> >>> To: yguang11@xxxxxxxxxxx
> >>> CC: ceph-devel@xxxxxxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx
> >>> Subject: Re: radosgw crash within libfcgi
> >>>
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> From: "GuangYang" <yguang11@xxxxxxxxxxx>
> >>>> To: ceph-devel@xxxxxxxxxxxxxxx, ceph-users@xxxxxxxxxxxxxx,
> >>>> yehuda@xxxxxxxxxx
> >>>> Sent: Wednesday, June 24, 2015 10:09:58 AM
> >>>> Subject: radosgw crash within libfcgi
> >>>>
> >>>> Hello Cephers,
> >>>> Recently we have several radosgw daemon crashes with the same following
> >>>> kernel log:
> >>>>
> >>>> Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip
> >>>> 00007ffa069996f2 sp 00007ff55c432710 error 6 in
> >
> > error 6 is sigabrt, right? With invalid pointer I'd expect to get segfault.
> > Is the pointer actually invalid?
> With (ip - {address_load_the_sharded_library}) to get the instruction which
> caused this crash, the objdump shows the crash happened at instruction 46f2
> (see below), which was to assign '-1' to the CGX_Request::ipcFd to -1, but I
> don't quite understand how/why it could crash there.
> 
> 0000000000004690 <FCGX_Free>:
>     4690:       48 89 5c 24 f0          mov    %rbx,-0x10(%rsp)
>     4695:       48 89 6c 24 f8          mov    %rbp,-0x8(%rsp)
>     469a:       48 83 ec 18             sub    $0x18,%rsp
>     469e:       48 85 ff                test   %rdi,%rdi
>     46a1:       48 89 fb                mov    %rdi,%rbx
>     46a4:       89 f5                   mov    %esi,%ebp
>     46a6:       74 28                   je     46d0 <FCGX_Free+0x40>
>     46a8:       48 8d 7f 08             lea    0x8(%rdi),%rdi
>     46ac:       e8 67 e3 ff ff          callq  2a18 <FCGX_FreeStream@plt>
>     46b1:       48 8d 7b 10             lea    0x10(%rbx),%rdi
>     46b5:       e8 5e e3 ff ff          callq  2a18 <FCGX_FreeStream@plt>
>     46ba:       48 8d 7b 18             lea    0x18(%rbx),%rdi
>     46be:       e8 55 e3 ff ff          callq  2a18 <FCGX_FreeStream@plt>
>     46c3:       48 8d 7b 28             lea    0x28(%rbx),%rdi
>     46c7:       e8 d4 f4 ff ff          callq  3ba0 <FCGX_PutS+0x40>
>     46cc:       85 ed                   test   %ebp,%ebp
>     46ce:       75 10                   jne    46e0 <FCGX_Free+0x50>
>     46d0:       48 8b 5c 24 08          mov    0x8(%rsp),%rbx
>     46d5:       48 8b 6c 24 10          mov    0x10(%rsp),%rbp
>     46da:       48 83 c4 18             add    $0x18,%rsp
>     46de:       c3                      retq
>     46df:       90                      nop
>     46e0:       31 f6                   xor    %esi,%esi
>     46e2:       83 7b 4c 00             cmpl   $0x0,0x4c(%rbx)
>     46e6:       8b 7b 30                mov    0x30(%rbx),%edi
>     46e9:       40 0f 94 c6             sete   %sil
>     46ed:       e8 86 e6 ff ff          callq  2d78 <OS_IpcClose@plt>
>     46f2:       c7 43 30 ff ff ff ff    movl   $0xffffffff,0x30(%rbx)

info registers?

Not too familiar with the specific message, but it could be that OS_IpcClose() aborts (not highly unlikely) and it only dumps the return address of the current function (shouldn't be referenced as ip though).

What's rbx? Is the memory at %rbx + 0x30 valid?

Also, did you by any chance upgrade the binaries while the code was running? is the code running over nfs?

Yehuda

> >
> > Yehuda
> >
> >
> >>>> libfcgi.so.0.0.0[7ffa06995000+a000] in
> >>>> libfcgi.so.0.0.0[7ffa06995000+a000]
> >>>>
> >>>> Looking at the assembly, it seems crashing at this point -
> >>>> http://github.com/sknown/fcgi/blob/master/libfcgi/fcgiapp.c#L2035, which
> >>>> confused me. I tried to see if there is any other reference holding the
> >>>> FCGX_Request which release the handle without any luck.
> >>>>
> >>>> There are also other observations:
> >>>> 1> Several radosgw daemon across different hosts crashed around the same
> >>>> time.
> >>>> 2> Apache's error log has some fcgi error complaining ##idle timeout##
> >>>> during the time.
> >>>>
> >>>> Does anyone experience similar issue?
> >>>>
> >>>
> >>> In the past we've had issues with libfcgi that were related to the number
> >>> of open fds on the process (> 1024). The issue was a buggy libfcgi that
> >>> was using select() instead of poll(), so this might be the issue you're
> >>> noticing.
> >>>
> >>> Yehuda
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> N嫥叉靣笡y氊b瞂千v豝�藓{.n�壏渮榏z鳐妠ay�蕠跈�jf"穐殝鄗�畐ア�⒎:+v墾妛鑚豰稛�珣赙zZ+凒殠娸"濟!秈
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux