Hey all! I’ve run into an MDS crash on a cluster recently upgraded from Ceph 16.2.7 to 16.2.10. I’m hitting an assert nearly identical to this one gathered by the telemetry module: https://tracker.ceph.com/issues/54747 I have a new build compiling to test whether https://github.com/ceph/ceph/pull/43184/ makes a difference or not, when setting mds_inject_skip_replaying_inotable. Relevant logs are below, but I’m wondering if anyone has hit anything like this? Thanks in advance! === BEGIN LOG SNIPPET === -2> 2023-01-18T20:16:29.789+0000 7f6190243700 -1 log_channel(cluster) log [ERR] : journal replay alloc 0x10000000010 not in free [0x10000000011~0x3dc,0x100000003fb~0x1e8,0x100000005e5~0x2,0x100000009d4~0x2,0x1000005cc6d~0x4,0x10001c6b44e~0x4,0x10001cb91f4~0x1f4,0x10001cb93f4~0x3dd,0x10007582c15~0x279,0x10007582e90~0x1f4,0x10007583094~0xfff8a7cf6c] -1> 2023-01-18T20:16:29.789+0000 7f6190243700 -1 /builds/66321/e7c73776/ceph/-build//WORKDIR/ceph-16.2.10/src/mds/journal.cc: In function 'void EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)' thread 7f6190243700 time 2023-01-18T20:16:29.794189+0000 /WORKDIR/ceph-16.2.10/src/mds/journal.cc: 1577: FAILED ceph_assert(inotablev == mds->inotable->get_version()) ceph version 16.2.10 (e7c73776b3136f6d18a35febeb38f5fdd41be364) pacific (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14c) [0x7f619d548645] 2: /usr/lib/ceph/libceph-common.so.2(+0x27182f) [0x7f619d54882f] 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)+0x5815) [0x560bfd1c6935] 4: (EUpdate::replay(MDSRank*)+0x3c) [0x560bfd1c7ecc] 5: (MDLog::_replay_thread()+0xca9) [0x560bfd153de9] 6: (MDLog::ReplayThread::entry()+0xd) [0x560bfce78fdd] 7: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f619cf29fa3] 8: clone() 0> 2023-01-18T20:16:29.793+0000 7f6190243700 -1 *** Caught signal (Aborted) ** in thread 7f6190243700 thread_name:md_log_replay ceph version 16.2.10 (e7c73776b3136f6d18a35febeb38f5fdd41be364) pacific (stable) 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7f619cf34730] 2: gsignal() 3: abort() 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19d) [0x7f619d548696] 5: /usr/lib/ceph/libceph-common.so.2(+0x27182f) [0x7f619d54882f] 6: (EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)+0x5815) [0x560bfd1c6935] 7: (EUpdate::replay(MDSRank*)+0x3c) [0x560bfd1c7ecc] 8: (MDLog::_replay_thread()+0xca9) [0x560bfd153de9] 9: (MDLog::ReplayThread::entry()+0xd) [0x560bfce78fdd] 10: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f619cf29fa3] 11: clone() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. === END LOG SNIPPET === _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx