----- Original Message ----- > From: "Karol Mroz" <kmroz@xxxxxxxx> > To: "Ben Hines" <bhines@xxxxxxxxx> > Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Sent: Wednesday, 27 April, 2016 7:06:56 PM > Subject: Re: radosgw crash - Infernalis > > On Tue, Apr 26, 2016 at 10:17:31PM -0700, Ben Hines wrote: > [...] > > --> 10.30.1.6:6800/10350 -- osd_op(client.44852756.0:79 > > default.42048218.<redacted> [getxattrs,stat,read 0~524288] 12.aa730416 > > ack+read+known_if_redirected e100207) v6 -- ?+0 0x7f49c41880b0 con > > 0x7f49c4145eb0 > > 0> 2016-04-26 22:07:59.685615 7f49a07f0700 -1 *** Caught signal > > (Segmentation fault) ** > > in thread 7f49a07f0700 > > > > ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) > > 1: (()+0x30b0a2) [0x7f4c4907f0a2] > > 2: (()+0xf100) [0x7f4c44f7a100] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > > to interpret this. > > Hi Ben, > > I sense a pretty badly corrupted stack. From the radosgw-9.2.1 (obtained from > a downloaded rpm): > > 000000000030a810 <_Z13pidfile_writePK11md_config_t@@Base>: > ... > 30b09d: e8 0e 40 e4 ff callq 14f0b0 <backtrace@plt> > 30b0a2: 4c 89 ef mov %r13,%rdi > ------- > ... > > So either we tripped backtrace() code from pidfile_write() _or_ we can't > trust the stack. From the log snippet, it looks that we're far past the point > at which we would write a pidfile to disk (ie. at process start during > global_init()). > Rather, we're actually handling a request and outputting some bit of debug > message > via MSDOp::print() and beyond... It would help to know what binary this is and what OS. We know the offset into the function is 0x30b0a2 but we don't know which function yet AFAICT. Karol, how did you arrive at pidfile_write? Purely from the offset? I'm not sure that would be reliable... This is a segfault so the address of the frame where we crashed should be the exact instruction where we crashed. I don't believe a mov from one register to another that does not involve a dereference ((%r13) as opposed to %r13) can cause a segfault so I don't think we are on the right instruction but then, as you say, the stack may be corrupt. > > Is this something you're able to easily reproduce? More logs with higher log > levels > would be helpful... a coredump with radosgw compiled with -g would be > excellent :) Agreed, although if this is an rpm based system it should be sufficient to run the following. # debuginfo-install ceph glibc That may give us the name of the function depending on where we are (if we are in a library it may require the debuginfo for that library be loaded. Karol is right that a coredump would be a good idea in this case and will give us maximum information about the issue you are seeing. Cheers, Brad > > -- > Regards, > Karol > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com