ceph mds crash (mimic)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




My cephfs FS recently went through a long recovery from losing some PGs and ODSs. It finally came back to "HEALTH_OK" for a bit, but then the MDS servers started crashing with this error in the logs:
I cannot get any of the 3 MDS servers to stay up now.


  -313> 2019-07-11 17:42:39.820 7f612c147700  1 -- 10.10.30.116:6800/543707238 --> 10.10.30.115:6801/81746 -- mgrreport(unknown.ic2mon02 +0-0 packed 1374) v6 -- 0x2ed1c00 con 0
  -313> 2019-07-11 17:42:39.820 7f612b946700 -1 /build/ceph-13.2.6/src/mds/MDCache.cc: In function 'void MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)' thread 7f612b946700 time 2019-07-11 17:42:39.820872
/build/ceph-13.2.6/src/mds/MDCache.cc: 1680: FAILED assert(follows >= realm->get_newest_seq())

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7f61367b997e]
 2: (()+0x2fab07) [0x7f61367b9b07]
 3: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0xd3f) [0x5f821f]
 4: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xc0) [0x5f8450]
 5: (MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x4b1) [0x5f9141]
 6: (Locker::scatter_writebehind(ScatterLock*)+0x465) [0x64a615]
 7: (Locker::simple_sync(SimpleLock*, bool*)+0x176) [0x64e506]
 8: (Locker::scatter_nudge(ScatterLock*, MDSInternalContextBase*, bool)+0x3dd) [0x652f6d]
 9: (Locker::scatter_tick()+0x1e4) [0x6535a4]
 10: (Locker::tick()+0x9) [0x6538b9]
 11: (MDSRankDispatcher::tick()+0x1e9) [0x4f00d9]
 12: (FunctionContext::finish(int)+0x2c) [0x4d52dc]
 13: (Context::complete(int)+0x9) [0x4d31d9]
 14: (SafeTimer::timer_thread()+0x18b) [0x7f61367b620b]
 15: (SafeTimerThread::entry()+0xd) [0x7f61367b786d]
 16: (()+0x76ba) [0x7f61360356ba]
 17: (clone()+0x6d) [0x7f613585e41d]

  -313> 2019-07-11 17:42:39.820 7f612b946700 -1 *** Caught signal (Aborted) **
 in thread 7f612b946700 thread_name:safe_timer

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (()+0x11390) [0x7f613603f390]
 2: (gsignal()+0x38) [0x7f613578c428]
 3: (abort()+0x16a) [0x7f613578e02a]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x7f61367b9a86]
 5: (()+0x2fab07) [0x7f61367b9b07]
 6: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0xd3f) [0x5f821f]
 7: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xc0) [0x5f8450]
 8: (MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x4b1) [0x5f9141]
 9: (Locker::scatter_writebehind(ScatterLock*)+0x465) [0x64a615]
 10: (Locker::simple_sync(SimpleLock*, bool*)+0x176) [0x64e506]
 11: (Locker::scatter_nudge(ScatterLock*, MDSInternalContextBase*, bool)+0x3dd) [0x652f6d]
 12: (Locker::scatter_tick()+0x1e4) [0x6535a4]
 13: (Locker::tick()+0x9) [0x6538b9]
 14: (MDSRankDispatcher::tick()+0x1e9) [0x4f00d9]
 15: (FunctionContext::finish(int)+0x2c) [0x4d52dc]
 16: (Context::complete(int)+0x9) [0x4d31d9]
 17: (SafeTimer::timer_thread()+0x18b) [0x7f61367b620b]
 18: (SafeTimerThread::entry()+0xd) [0x7f61367b786d]
 19: (()+0x76ba) [0x7f61360356ba]
 20: (clone()+0x6d) [0x7f613585e41d]
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux