MDS crash in "inotablev == mds->inotable->get_version()"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey all!  I’ve run into an MDS crash on a cluster recently upgraded from Ceph 16.2.7 to 16.2.10.  I’m hitting an assert nearly identical to this one gathered by the telemetry module:
	https://tracker.ceph.com/issues/54747

I have a new build compiling to test whether https://github.com/ceph/ceph/pull/43184/  makes a difference or not, when setting mds_inject_skip_replaying_inotable.

Relevant logs are below, but I’m wondering if anyone has hit anything like this?  Thanks in advance!


=== BEGIN LOG SNIPPET ===

    -2> 2023-01-18T20:16:29.789+0000 7f6190243700 -1 log_channel(cluster) log [ERR] : journal replay alloc 0x10000000010 not in free [0x10000000011~0x3dc,0x100000003fb~0x1e8,0x100000005e5~0x2,0x100000009d4~0x2,0x1000005cc6d~0x4,0x10001c6b44e~0x4,0x10001cb91f4~0x1f4,0x10001cb93f4~0x3dd,0x10007582c15~0x279,0x10007582e90~0x1f4,0x10007583094~0xfff8a7cf6c]
    -1> 2023-01-18T20:16:29.789+0000 7f6190243700 -1 /builds/66321/e7c73776/ceph/-build//WORKDIR/ceph-16.2.10/src/mds/journal.cc: In function 'void EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)' thread 7f6190243700 time 2023-01-18T20:16:29.794189+0000
/WORKDIR/ceph-16.2.10/src/mds/journal.cc: 1577: FAILED ceph_assert(inotablev == mds->inotable->get_version())

 ceph version 16.2.10 (e7c73776b3136f6d18a35febeb38f5fdd41be364) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14c) [0x7f619d548645]
 2: /usr/lib/ceph/libceph-common.so.2(+0x27182f) [0x7f619d54882f]
 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)+0x5815) [0x560bfd1c6935]
 4: (EUpdate::replay(MDSRank*)+0x3c) [0x560bfd1c7ecc]
 5: (MDLog::_replay_thread()+0xca9) [0x560bfd153de9]
 6: (MDLog::ReplayThread::entry()+0xd) [0x560bfce78fdd]
 7: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f619cf29fa3]
 8: clone()

     0> 2023-01-18T20:16:29.793+0000 7f6190243700 -1 *** Caught signal (Aborted) **
 in thread 7f6190243700 thread_name:md_log_replay

 ceph version 16.2.10 (e7c73776b3136f6d18a35febeb38f5fdd41be364) pacific (stable)
 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7f619cf34730]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19d) [0x7f619d548696]
 5: /usr/lib/ceph/libceph-common.so.2(+0x27182f) [0x7f619d54882f]
 6: (EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)+0x5815) [0x560bfd1c6935]
 7: (EUpdate::replay(MDSRank*)+0x3c) [0x560bfd1c7ecc]
 8: (MDLog::_replay_thread()+0xca9) [0x560bfd153de9]
 9: (MDLog::ReplayThread::entry()+0xd) [0x560bfce78fdd]
 10: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f619cf29fa3]
 11: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

=== END LOG SNIPPET ===

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux