On Thu, Dec 13, 2018 at 9:25 PM Sang, Oliver <oliver.sang@xxxxxxxxx> wrote: > > Thanks a lot, Yan Zheng! > > Regarding the " set debug_mds =10 for standby mds (change debug_mds to 0 after mds becomes active)." > Could you please explain the purpose? Just want to collect debug log, or it really has the side effect to prevent mds lost? > > Regarding the patch itself. Sorry we didn't compile from source. However, may I ask whether it will be included in a new v12 release in the future? Thanks > the crash happened during mds recovers. I want collect debug log during that period Regards Yan, Zheng > BR > Oliver > > -----Original Message----- > From: Yan, Zheng [mailto:ukernel@xxxxxxxxx] > Sent: Thursday, December 13, 2018 3:44 PM > To: Sang, Oliver <oliver.sang@xxxxxxxxx> > Cc: ceph-users@xxxxxxxxxxxxxx > Subject: Re: mds lost very frequently > > On Thu, Dec 13, 2018 at 2:55 AM Sang, Oliver <oliver.sang@xxxxxxxxx> wrote: > > > > We are using luminous, we have seven ceph nodes and setup them all as MDS. > > > > Recently the MDS lost very frequently, and when there is only one MDS left, the cephfs just degraded to unusable. > > > > > > > > Checked the mds log in one ceph node, I found below > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > > /build/ceph-12.2.8/src/mds/Locker.cc: 5076: FAILED > > assert(lock->get_state() == LOCK_PRE_SCAN) > > > > > > > > ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) > > luminous (stable) > > > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > const*)+0x102) [0x564400e50e42] > > > > 2: (Locker::file_recover(ScatterLock*)+0x208) [0x564400c6ae18] > > > > 3: (MDCache::start_files_to_recover()+0xb3) [0x564400b98af3] > > > > 4: (MDSRank::clientreplay_start()+0x1f7) [0x564400ae04c7] > > > > 5: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x25c0) > > [0x564400aefd40] > > > > 6: (MDSDaemon::handle_mds_map(MMDSMap*)+0x154d) [0x564400ace3bd] > > > > 7: (MDSDaemon::handle_core_message(Message*)+0x7f3) [0x564400ad1273] > > > > 8: (MDSDaemon::ms_dispatch(Message*)+0x1c3) [0x564400ad15a3] > > > > 9: (DispatchQueue::entry()+0xeda) [0x5644011a547a] > > > > 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x564400ee3fcd] > > > > 11: (()+0x7494) [0x7f7a2b106494] > > > > 12: (clone()+0x3f) [0x7f7a2a17eaff] > > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > > > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > > > > > > > The full log is also attached. Could you please help us? Thanks! > > > > > > Please try below patch if you can compile ceph from source. If you can't compile ceph or the issue still happens, please set debug_mds = > 10 for standby mds (change debug_mds to 0 after mds becomes active). > > Regards > Yan, Zheng > > diff --git a/src/mds/MDSRank.cc b/src/mds/MDSRank.cc index 1e8b024b8a..d1150578f1 100644 > --- a/src/mds/MDSRank.cc > +++ b/src/mds/MDSRank.cc > @@ -1454,8 +1454,8 @@ void MDSRank::rejoin_done() void MDSRank::clientreplay_start() { > dout(1) << "clientreplay_start" << dendl; > - finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters > mdcache->start_files_to_recover(); > + finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters > queue_one_replay(); > } > > @@ -1487,8 +1487,8 @@ void MDSRank::active_start() > > mdcache->clean_open_file_lists(); > mdcache->export_remaining_imported_caps(); > - finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters > mdcache->start_files_to_recover(); > + finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters > > mdcache->reissue_all_caps(); > mdcache->activate_stray_manager(); > > > > > > > BR > > > > Oliver > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com