Re: radosgw crash - Infernalis

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- Original Message -----
> From: "Karol Mroz" <kmroz@xxxxxxxx>
> To: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
> Cc: "Ben Hines" <bhines@xxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Sent: Thursday, 28 April, 2016 7:17:05 PM
> Subject: Re:  radosgw crash - Infernalis
> 
> Hi Brad,
> 
> On Wed, Apr 27, 2016 at 11:40:40PM -0400, Brad Hubbard wrote:
> [...]
> > > 000000000030a810 <_Z13pidfile_writePK11md_config_t@@Base>:
> > > ...
> > >   30b09d:       e8 0e 40 e4 ff          callq  14f0b0 <backtrace@plt>
> > >   30b0a2:       4c 89 ef                mov    %r13,%rdi
> > >   -------
> > > ...
> > > 
> > > So either we tripped backtrace() code from pidfile_write() _or_ we can't
> > > trust the stack. From the log snippet, it looks that we're far past the
> > > point
> > > at which we would write a pidfile to disk (ie. at process start during
> > > global_init()).
> > > Rather, we're actually handling a request and outputting some bit of
> > > debug
> > > message
> > > via MSDOp::print() and beyond...
> > 
> > It would help to know what binary this is and what OS.
> > 
> > We know the offset into the function is 0x30b0a2 but we don't know which
> > function yet AFAICT. Karol, how did you arrive at pidfile_write? Purely
> > from
> > the offset? I'm not sure that would be reliable...
> 
> Correct, from the offset. Let me clarify, I don't think pidfile_write() is
> the
> function in which we segfaulted :) Hence my suspicion of a blown stack. I

You could definitely be on the money here but IMHO it is too early to tell.

> don't
> know the specifics behind the backtrace call used to generate this stack...
> so
> maybe this is a naive question... but why do you think the offset is
> unreliable?
> Perhaps I'm not reading this trace correctly?

Well, you could have multiple functions which include an offset of 0x30b0a2.
Which function would it be in that case? The other frame shows an offset of
0xf100, can you identify that function just from the offset?

The following stack gives some good examples.

 1: /usr/bin/ceph-osd() [0xa05e32]
 2: (()+0xf100) [0x7f9ea295c100]
 3: (OSD::handle_osd_ping(MOSDPing*)+0x75a) [0x659e7a]
 4: (OSD::heartbeat_dispatch(Message*)+0x2fb) [0x65b0cb]
 5: (DispatchQueue::entry()+0x62a) [0xbc2aba]
 6: (DispatchQueue::DispatchThread::entry()+0xd) [0xae572d]
 7: (()+0x7dc5) [0x7f9ea2954dc5]
 8: (clone()+0x6d) [0x7f9ea143528d]

The offsets are relative to the address where the function is loaded in memory
and I don't think searching for 0x6d, 0x2fb, 0x62a or 0x75a will give you the
correct result if you don't know which function you are dealing with.  The
offset is just an offset from the start of *some function* so without knowing
which function we can't be sure what instruction we were on.  That's my
understanding anyway.

I agree that a stack with only two frames looks dodgy though and we may be
chasing our tails but I'm hoping we can squeeze more info out of a core or a
better stack trace with all debuginfo loaded (if the function has no name due
to lack of debuginfo and not due to stack corruption).

Cheers,
Brad

> 
> > 
> > This is a segfault so the address of the frame where we crashed should be
> > the
> > exact instruction where we crashed. I don't believe a mov from one register
> > to
> > another that does not involve a dereference ((%r13) as opposed to %r13) can
> > cause a segfault so I don't think we are on the right instruction but then,
> > as
> > you say, the stack may be corrupt.
> 
> Agreed... a mov between registers wouldn't cause a segfault.
> 
> --
> Regards,
> Karol
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux