Thanks a lot, Yan Zheng! Regarding the " set debug_mds =10 for standby mds (change debug_mds to 0 after mds becomes active)." Could you please explain the purpose? Just want to collect debug log, or it really has the side effect to prevent mds lost? Regarding the patch itself. Sorry we didn't compile from source. However, may I ask whether it will be included in a new v12 release in the future? Thanks BR Oliver -----Original Message----- From: Yan, Zheng [mailto:ukernel@xxxxxxxxx] Sent: Thursday, December 13, 2018 3:44 PM To: Sang, Oliver <oliver.sang@xxxxxxxxx> Cc: ceph-users@xxxxxxxxxxxxxx Subject: Re: mds lost very frequently On Thu, Dec 13, 2018 at 2:55 AM Sang, Oliver <oliver.sang@xxxxxxxxx> wrote: > > We are using luminous, we have seven ceph nodes and setup them all as MDS. > > Recently the MDS lost very frequently, and when there is only one MDS left, the cephfs just degraded to unusable. > > > > Checked the mds log in one ceph node, I found below > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > /build/ceph-12.2.8/src/mds/Locker.cc: 5076: FAILED > assert(lock->get_state() == LOCK_PRE_SCAN) > > > > ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) > luminous (stable) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x102) [0x564400e50e42] > > 2: (Locker::file_recover(ScatterLock*)+0x208) [0x564400c6ae18] > > 3: (MDCache::start_files_to_recover()+0xb3) [0x564400b98af3] > > 4: (MDSRank::clientreplay_start()+0x1f7) [0x564400ae04c7] > > 5: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x25c0) > [0x564400aefd40] > > 6: (MDSDaemon::handle_mds_map(MMDSMap*)+0x154d) [0x564400ace3bd] > > 7: (MDSDaemon::handle_core_message(Message*)+0x7f3) [0x564400ad1273] > > 8: (MDSDaemon::ms_dispatch(Message*)+0x1c3) [0x564400ad15a3] > > 9: (DispatchQueue::entry()+0xeda) [0x5644011a547a] > > 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x564400ee3fcd] > > 11: (()+0x7494) [0x7f7a2b106494] > > 12: (clone()+0x3f) [0x7f7a2a17eaff] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > > > The full log is also attached. Could you please help us? Thanks! > > Please try below patch if you can compile ceph from source. If you can't compile ceph or the issue still happens, please set debug_mds = 10 for standby mds (change debug_mds to 0 after mds becomes active). Regards Yan, Zheng diff --git a/src/mds/MDSRank.cc b/src/mds/MDSRank.cc index 1e8b024b8a..d1150578f1 100644 --- a/src/mds/MDSRank.cc +++ b/src/mds/MDSRank.cc @@ -1454,8 +1454,8 @@ void MDSRank::rejoin_done() void MDSRank::clientreplay_start() { dout(1) << "clientreplay_start" << dendl; - finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters mdcache->start_files_to_recover(); + finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters queue_one_replay(); } @@ -1487,8 +1487,8 @@ void MDSRank::active_start() mdcache->clean_open_file_lists(); mdcache->export_remaining_imported_caps(); - finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters mdcache->start_files_to_recover(); + finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters mdcache->reissue_all_caps(); mdcache->activate_stray_manager(); > > BR > > Oliver > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com