Re: radosgw crash - Infernalis

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, CentOS 7.2. Happened twice in a row, both times shortly after a restart, so i expect i'll be able to reproduce it. However, i've now tried a bunch of times and it's not happening again.

In any case i have glibc + ceph-debuginfo installed so we can get more info if it does happen.

thanks!

On Wed, Apr 27, 2016 at 8:40 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
----- Original Message -----
> From: "Karol Mroz" <kmroz@xxxxxxxx>
> To: "Ben Hines" <bhines@xxxxxxxxx>
> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Sent: Wednesday, 27 April, 2016 7:06:56 PM
> Subject: Re: radosgw crash - Infernalis
>
> On Tue, Apr 26, 2016 at 10:17:31PM -0700, Ben Hines wrote:
> [...]
> > --> 10.30.1.6:6800/10350 -- osd_op(client.44852756.0:79
> > default.42048218.<redacted> [getxattrs,stat,read 0~524288] 12.aa730416
> > ack+read+known_if_redirected e100207) v6 -- ?+0 0x7f49c41880b0 con
> > 0x7f49c4145eb0
> >      0> 2016-04-26 22:07:59.685615 7f49a07f0700 -1 *** Caught signal
> > (Segmentation fault) **
> >  in thread 7f49a07f0700
> >
> >  ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
> >  1: (()+0x30b0a2) [0x7f4c4907f0a2]
> >  2: (()+0xf100) [0x7f4c44f7a100]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > to interpret this.
>
> Hi Ben,
>
> I sense a pretty badly corrupted stack. From the radosgw-9.2.1 (obtained from
> a downloaded rpm):
>
> 000000000030a810 <_Z13pidfile_writePK11md_config_t@@Base>:
> ...
>   30b09d:       e8 0e 40 e4 ff          callq  14f0b0 <backtrace@plt>
>   30b0a2:       4c 89 ef                mov    %r13,%rdi
>   -------
> ...
>
> So either we tripped backtrace() code from pidfile_write() _or_ we can't
> trust the stack. From the log snippet, it looks that we're far past the point
> at which we would write a pidfile to disk (ie. at process start during
> global_init()).
> Rather, we're actually handling a request and outputting some bit of debug
> message
> via MSDOp::print() and beyond...

It would help to know what binary this is and what OS.

We know the offset into the function is 0x30b0a2 but we don't know which
function yet AFAICT. Karol, how did you arrive at pidfile_write? Purely from
the offset? I'm not sure that would be reliable...

This is a segfault so the address of the frame where we crashed should be the
exact instruction where we crashed. I don't believe a mov from one register to
another that does not involve a dereference ((%r13) as opposed to %r13) can
cause a segfault so I don't think we are on the right instruction but then, as
you say, the stack may be corrupt.

>
> Is this something you're able to easily reproduce? More logs with higher log
> levels
> would be helpful... a coredump with radosgw compiled with -g would be
> excellent :)

Agreed, although if this is an rpm based system it should be sufficient to
run the following.

# debuginfo-install ceph glibc

That may give us the name of the function depending on where we are (if we are
in a library it may require the debuginfo for that library be loaded.

Karol is right that a coredump would be a good idea in this case and will give
us maximum information about the issue you are seeing.

Cheers,
Brad

>
> --
> Regards,
> Karol
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux