Hmm, at first glance it looks like you're using multiple active MDSes and you've created some snapshots and part of that state got corrupted somehow. The log files should have a slightly more helpful (including line numbers) stack trace at the end, and might have more context for what's gone wrong. Also, what's the output of "ceph -s"? But I think you might be in some trouble from using two unstable features at the same time. :( -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Mar 10, 2014 at 12:24 PM, Pawel Veselov <pawel.veselov@xxxxxxxxx> wrote: > Hi. > > All of a sudden, MDS started crashing, causing havoc on our deployment. > Any help would be greatly appreciated. > > ceph.x86_64 0.56.7-0.el6 @ceph > > -1> 2014-03-10 19:16:35.956323 7f9681cb3700 1 mds.0.12 > rejoin_joint_start > 0> 2014-03-10 19:16:35.982031 7f9681cb3700 -1 *** Caught signal > (Segmentation fault) ** > in thread 7f9681cb3700 > > ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33) > 1: /usr/bin/ceph-mds() [0x813a91] > 2: (()+0xf8e0) [0x7f96863748e0] > 3: (SnapRealm::have_past_parents_open(snapid_t, snapid_t)+0x5a) [0x6be9da] > 4: (MDCache::check_realm_past_parents(SnapRealm*)+0x2b) [0x55fe7b] > 5: (MDCache::choose_lock_states_and_reconnect_caps()+0x29d) [0x567ddd] > 6: (MDCache::rejoin_gather_finish()+0x91) [0x59da91] > 7: (MDCache::rejoin_send_rejoins()+0x1b4f) [0x5a50bf] > 8: (MDS::rejoin_joint_start()+0x13e) [0x4a718e] > 9: (MDS::handle_mds_map(MMDSMap*)+0x2cda) [0x4bbf8a] > 10: (MDS::handle_core_message(Message*)+0x93b) [0x4bdfeb] > 11: (MDS::_dispatch(Message*)+0x2f) [0x4be0bf] > 12: (MDS::ms_dispatch(Message*)+0x19b) [0x4bfc9b] > 13: (DispatchQueue::entry()+0x309) [0x7e5cf9] > 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7d607d] > 15: (()+0x7c6b) [0x7f968636cc6b] > 16: (clone()+0x6d) [0x7f968550e5ed] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > We are using stock executables from the repo, but just in case, here is what > I believe the point where it crashes: > > 6be9b5: 48 29 d0 sub %rdx,%rax > 6be9b8: 48 c1 f8 04 sar $0x4,%rax > 6be9bc: 48 83 f8 04 cmp $0x4,%rax > 6be9c0: 0f 86 81 02 00 00 jbe 6bec47 > <_ZN9SnapRealm22have_past_parents_openE8snapid_tS0_+0x2c7> > 6be9c6: 83 7a 44 09 cmpl $0x9,0x44(%rdx) > 6be9ca: 0f 8f 83 04 00 00 jg 6bee53 > <_ZN9SnapRealm22have_past_parents_openE8snapid_tS0_+0x4d3> > 6be9d0: 83 7a 40 09 cmpl $0x9,0x40(%rdx) > 6be9d4: 0f 8f 79 04 00 00 jg 6bee53 > <_ZN9SnapRealm22have_past_parents_openE8snapid_tS0_+0x4d3> > 6be9da: 41 80 bc 24 98 00 00 cmpb $0x0,0x98(%r12) > 6be9e1: 00 00 > 6be9e3: b8 01 00 00 00 mov $0x1,%eax > 6be9e8: 0f 85 51 01 00 00 jne 6beb3f > <_ZN9SnapRealm22have_past_parents_openE8snapid_tS0_+0x1bf> > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com