MDS crash due to seemingly unrecoverable metadata error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I have a weird problem with my ceph cluster:

basic info:

 - 3-node cluster
 - cephfs runs on three data pools:
    - cephfs_meta (replicated)
    - ec_basic (erasure coded)
    - ec_sensitive (erasure coded with higher redundancy)

My MDS keeps crashing with a bad backtrace error:
2022-02-21T16:11:09.661+0100 7fd2cd290700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10002000f5d

So far so good. To my best understanding these metadata errors should be fixed by following the disaster recovery procedure described here: https://docs.ceph.com/en/nautilus/cephfs/disaster-recovery-experts/

However, the weird part is: the error remains unchanged. Even directly after resetting, i.e. before recreating metadata objects, the error does not change.

Is there something else that i need to reset?
I have already tried to delete the corrupt inode via rmomapkey, i.e. rados -p cephfs_meta listomapkeys 10002000f5d.00000000  returns empty

Any suggestions on how to proceed? Any hints are appreciated!

MDS Log:

--------------------------
Feb 21 16:11:07 herta systemd[1]: Started Ceph metadata server daemon.
Feb 21 16:11:07 herta ceph-mds[128287]: starting mds.herta at
Feb 21 16:11:09 herta ceph-mds[128287]: 2022-02-21T16:11:09.661+0100 7fd2cd290700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10002000f5d Feb 21 16:11:10 herta ceph-mds[128287]: ./src/mds/CInode.cc: In function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7fd2cd290700 time 2022-02-21T16:11:10.629363+0100 Feb 21 16:11:10 herta ceph-mds[128287]: ./src/mds/CInode.cc: 785: FAILED ceph_assert(is_dir()) Feb 21 16:11:10 herta ceph-mds[128287]:  ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x7fd2d876e046] Feb 21 16:11:10 herta ceph-mds[128287]:  2: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]:  3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]:  4: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]:  5: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  6: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]:  7: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]:  8: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]:  9: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  10: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]:  11: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]:  12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7]
Feb 21 16:11:10 herta ceph-mds[128287]:  13: clone()
Feb 21 16:11:10 herta ceph-mds[128287]: *** Caught signal (Aborted) **
Feb 21 16:11:10 herta ceph-mds[128287]:  in thread 7fd2cd290700 thread_name:MR_Finisher Feb 21 16:11:10 herta ceph-mds[128287]: 2022-02-21T16:11:10.625+0100 7fd2cd290700 -1 ./src/mds/CInode.cc: In function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7fd2cd290700 time 2022-02-21T16:11:10.629363+0100 Feb 21 16:11:10 herta ceph-mds[128287]: ./src/mds/CInode.cc: 785: FAILED ceph_assert(is_dir()) Feb 21 16:11:10 herta ceph-mds[128287]:  ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x7fd2d876e046] Feb 21 16:11:10 herta ceph-mds[128287]:  2: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]:  3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]:  4: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]:  5: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  6: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]:  7: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]:  8: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]:  9: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  10: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]:  11: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]:  12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7]
Feb 21 16:11:10 herta ceph-mds[128287]:  13: clone()
Feb 21 16:11:10 herta ceph-mds[128287]:  ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]:  1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7fd2d84d5140]
Feb 21 16:11:10 herta ceph-mds[128287]:  2: gsignal()
Feb 21 16:11:10 herta ceph-mds[128287]:  3: abort()
Feb 21 16:11:10 herta ceph-mds[128287]:  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16e) [0x7fd2d876e090] Feb 21 16:11:10 herta ceph-mds[128287]:  5: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]:  6: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]:  7: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]:  8: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  9: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]:  10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]:  11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]:  12: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  13: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]:  14: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]:  15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7]
Feb 21 16:11:10 herta ceph-mds[128287]:  16: clone()
Feb 21 16:11:10 herta ceph-mds[128287]: 2022-02-21T16:11:10.629+0100 7fd2cd290700 -1 *** Caught signal (Aborted) ** Feb 21 16:11:10 herta ceph-mds[128287]:  in thread 7fd2cd290700 thread_name:MR_Finisher Feb 21 16:11:10 herta ceph-mds[128287]:  ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]:  1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7fd2d84d5140]
Feb 21 16:11:10 herta ceph-mds[128287]:  2: gsignal()
Feb 21 16:11:10 herta ceph-mds[128287]:  3: abort()
Feb 21 16:11:10 herta ceph-mds[128287]:  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16e) [0x7fd2d876e090] Feb 21 16:11:10 herta ceph-mds[128287]:  5: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]:  6: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]:  7: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]:  8: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  9: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]:  10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]:  11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]:  12: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  13: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]:  14: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]:  15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7]
Feb 21 16:11:10 herta ceph-mds[128287]:  16: clone()
Feb 21 16:11:10 herta ceph-mds[128287]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Feb 21 16:11:10 herta ceph-mds[128287]:  -1430> 2022-02-21T16:11:09.661+0100 7fd2cd290700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10002000f5d Feb 21 16:11:10 herta ceph-mds[128287]:  -1429> 2022-02-21T16:11:10.625+0100 7fd2cd290700 -1 ./src/mds/CInode.cc: In function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7fd2cd290700 time 2022-02-21T16:11:10.629363+0100 Feb 21 16:11:10 herta ceph-mds[128287]: ./src/mds/CInode.cc: 785: FAILED ceph_assert(is_dir()) Feb 21 16:11:10 herta ceph-mds[128287]:  ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x7fd2d876e046] Feb 21 16:11:10 herta ceph-mds[128287]:  2: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]:  3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]:  4: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]:  5: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  6: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]:  7: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]:  8: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]:  9: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  10: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]:  11: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]:  12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7]
Feb 21 16:11:10 herta ceph-mds[128287]:  13: clone()
Feb 21 16:11:10 herta ceph-mds[128287]:  -1428> 2022-02-21T16:11:10.629+0100 7fd2cd290700 -1 *** Caught signal (Aborted) ** Feb 21 16:11:10 herta ceph-mds[128287]:  in thread 7fd2cd290700 thread_name:MR_Finisher Feb 21 16:11:10 herta ceph-mds[128287]:  ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]:  1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7fd2d84d5140]
Feb 21 16:11:10 herta ceph-mds[128287]:  2: gsignal()
Feb 21 16:11:10 herta ceph-mds[128287]:  3: abort()
Feb 21 16:11:10 herta ceph-mds[128287]:  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16e) [0x7fd2d876e090] Feb 21 16:11:10 herta ceph-mds[128287]:  5: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]:  6: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]:  7: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]:  8: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  9: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]:  10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]:  11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]:  12: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  13: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]:  14: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]:  15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7]
Feb 21 16:11:10 herta ceph-mds[128287]:  16: clone()
Feb 21 16:11:10 herta ceph-mds[128287]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux