On Mon, 10 Mar 2014, Gregory Farnum wrote: > Hmm, at first glance it looks like you're using multiple active MDSes > and you've created some snapshots and part of that state got corrupted > somehow. The log files should have a slightly more helpful (including > line numbers) stack trace at the end, and might have more context for > what's gone wrong. > Also, what's the output of "ceph -s"? > But I think you might be in some trouble from using two unstable > features at the same time. :( > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Mon, Mar 10, 2014 at 12:24 PM, Pawel Veselov <pawel.veselov@xxxxxxxxx> wrote: > > Hi. > > > > All of a sudden, MDS started crashing, causing havoc on our deployment. > > Any help would be greatly appreciated. > > > > ceph.x86_64 0.56.7-0.el6 @ceph You might start by upgrading the cluster; this release is quite old, and many small (and large) things have been fixed in the last year. sage > > > > -1> 2014-03-10 19:16:35.956323 7f9681cb3700 1 mds.0.12 > > rejoin_joint_start > > 0> 2014-03-10 19:16:35.982031 7f9681cb3700 -1 *** Caught signal > > (Segmentation fault) ** > > in thread 7f9681cb3700 > > > > ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33) > > 1: /usr/bin/ceph-mds() [0x813a91] > > 2: (()+0xf8e0) [0x7f96863748e0] > > 3: (SnapRealm::have_past_parents_open(snapid_t, snapid_t)+0x5a) [0x6be9da] > > 4: (MDCache::check_realm_past_parents(SnapRealm*)+0x2b) [0x55fe7b] > > 5: (MDCache::choose_lock_states_and_reconnect_caps()+0x29d) [0x567ddd] > > 6: (MDCache::rejoin_gather_finish()+0x91) [0x59da91] > > 7: (MDCache::rejoin_send_rejoins()+0x1b4f) [0x5a50bf] > > 8: (MDS::rejoin_joint_start()+0x13e) [0x4a718e] > > 9: (MDS::handle_mds_map(MMDSMap*)+0x2cda) [0x4bbf8a] > > 10: (MDS::handle_core_message(Message*)+0x93b) [0x4bdfeb] > > 11: (MDS::_dispatch(Message*)+0x2f) [0x4be0bf] > > 12: (MDS::ms_dispatch(Message*)+0x19b) [0x4bfc9b] > > 13: (DispatchQueue::entry()+0x309) [0x7e5cf9] > > 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7d607d] > > 15: (()+0x7c6b) [0x7f968636cc6b] > > 16: (clone()+0x6d) [0x7f968550e5ed] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > > interpret this. > > > > We are using stock executables from the repo, but just in case, here is what > > I believe the point where it crashes: > > > > 6be9b5: 48 29 d0 sub %rdx,%rax > > 6be9b8: 48 c1 f8 04 sar $0x4,%rax > > 6be9bc: 48 83 f8 04 cmp $0x4,%rax > > 6be9c0: 0f 86 81 02 00 00 jbe 6bec47 > > <_ZN9SnapRealm22have_past_parents_openE8snapid_tS0_+0x2c7> > > 6be9c6: 83 7a 44 09 cmpl $0x9,0x44(%rdx) > > 6be9ca: 0f 8f 83 04 00 00 jg 6bee53 > > <_ZN9SnapRealm22have_past_parents_openE8snapid_tS0_+0x4d3> > > 6be9d0: 83 7a 40 09 cmpl $0x9,0x40(%rdx) > > 6be9d4: 0f 8f 79 04 00 00 jg 6bee53 > > <_ZN9SnapRealm22have_past_parents_openE8snapid_tS0_+0x4d3> > > 6be9da: 41 80 bc 24 98 00 00 cmpb $0x0,0x98(%r12) > > 6be9e1: 00 00 > > 6be9e3: b8 01 00 00 00 mov $0x1,%eax > > 6be9e8: 0f 85 51 01 00 00 jne 6beb3f > > <_ZN9SnapRealm22have_past_parents_openE8snapid_tS0_+0x1bf> > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com