Re: mds crashes constantly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hmm, at first glance it looks like you're using multiple active MDSes
and you've created some snapshots and part of that state got corrupted
somehow. The log files should have a slightly more helpful (including
line numbers) stack trace at the end, and might have more context for
what's gone wrong.
Also, what's the output of "ceph -s"?
But I think you might be in some trouble from using two unstable
features at the same time. :(
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Mar 10, 2014 at 12:24 PM, Pawel Veselov <pawel.veselov@xxxxxxxxx> wrote:
> Hi.
>
> All of a sudden, MDS started crashing, causing havoc on our deployment.
> Any help would be greatly appreciated.
>
> ceph.x86_64                          0.56.7-0.el6                  @ceph
>
>     -1> 2014-03-10 19:16:35.956323 7f9681cb3700  1 mds.0.12
> rejoin_joint_start
>      0> 2014-03-10 19:16:35.982031 7f9681cb3700 -1 *** Caught signal
> (Segmentation fault) **
>  in thread 7f9681cb3700
>
>  ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33)
>  1: /usr/bin/ceph-mds() [0x813a91]
>  2: (()+0xf8e0) [0x7f96863748e0]
>  3: (SnapRealm::have_past_parents_open(snapid_t, snapid_t)+0x5a) [0x6be9da]
>  4: (MDCache::check_realm_past_parents(SnapRealm*)+0x2b) [0x55fe7b]
>  5: (MDCache::choose_lock_states_and_reconnect_caps()+0x29d) [0x567ddd]
>  6: (MDCache::rejoin_gather_finish()+0x91) [0x59da91]
>  7: (MDCache::rejoin_send_rejoins()+0x1b4f) [0x5a50bf]
>  8: (MDS::rejoin_joint_start()+0x13e) [0x4a718e]
>  9: (MDS::handle_mds_map(MMDSMap*)+0x2cda) [0x4bbf8a]
>  10: (MDS::handle_core_message(Message*)+0x93b) [0x4bdfeb]
>  11: (MDS::_dispatch(Message*)+0x2f) [0x4be0bf]
>  12: (MDS::ms_dispatch(Message*)+0x19b) [0x4bfc9b]
>  13: (DispatchQueue::entry()+0x309) [0x7e5cf9]
>  14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7d607d]
>  15: (()+0x7c6b) [0x7f968636cc6b]
>  16: (clone()+0x6d) [0x7f968550e5ed]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> We are using stock executables from the repo, but just in case, here is what
> I believe the point where it crashes:
>
>   6be9b5:       48 29 d0                sub    %rdx,%rax
>   6be9b8:       48 c1 f8 04             sar    $0x4,%rax
>   6be9bc:       48 83 f8 04             cmp    $0x4,%rax
>   6be9c0:       0f 86 81 02 00 00       jbe    6bec47
> <_ZN9SnapRealm22have_past_parents_openE8snapid_tS0_+0x2c7>
>   6be9c6:       83 7a 44 09             cmpl   $0x9,0x44(%rdx)
>   6be9ca:       0f 8f 83 04 00 00       jg     6bee53
> <_ZN9SnapRealm22have_past_parents_openE8snapid_tS0_+0x4d3>
>   6be9d0:       83 7a 40 09             cmpl   $0x9,0x40(%rdx)
>   6be9d4:       0f 8f 79 04 00 00       jg     6bee53
> <_ZN9SnapRealm22have_past_parents_openE8snapid_tS0_+0x4d3>
>   6be9da:       41 80 bc 24 98 00 00    cmpb   $0x0,0x98(%r12)
>   6be9e1:       00 00
>   6be9e3:       b8 01 00 00 00          mov    $0x1,%eax
>   6be9e8:       0f 85 51 01 00 00       jne    6beb3f
> <_ZN9SnapRealm22have_past_parents_openE8snapid_tS0_+0x1bf>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux