Hi all, we have an octopus v15.2.17 cluster and observe that one of our MDS hosts showed up in the OSD blacklist: # ceph osd blacklist ls 192.168.32.87:6801/3841823949 2023-03-22T10:08:02.589698+0100 192.168.32.87:6800/3841823949 2023-03-22T10:08:02.589698+0100 I see an MDS restart that might be related; see log snippets below. There are no clients running on this host, only OSDs and one MDS. What could be the reason for the blacklist entries? Thanks! Log snippets: Mar 21 10:07:54 ceph-23 journal: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f99e63d5700 time 2023-03-21T10:07:54.967936+0100 Mar 21 10:07:54 ceph-23 journal: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE) Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) Mar 21 10:07:54 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f99f4a25b92] Mar 21 10:07:54 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac] Mar 21 10:07:54 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c] Mar 21 10:07:54 ceph-23 journal: 4: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83] Mar 21 10:07:54 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) [0x561bd6422656] Mar 21 10:07:54 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c] Mar 21 10:07:54 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4] Mar 21 10:07:54 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95] Mar 21 10:07:54 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca] Mar 21 10:07:54 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3] Mar 21 10:07:54 ceph-23 journal: *** Caught signal (Aborted) ** Mar 21 10:07:54 ceph-23 journal: in thread 7f99e63d5700 thread_name:MR_Finisher Mar 21 10:07:54 ceph-23 journal: 2023-03-21T10:07:54.980+0100 7f99e63d5700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f99e63d5700 time 2023-03-21T10:07:54.967936+0100 Mar 21 10:07:54 ceph-23 journal: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE) Mar 21 10:07:54 ceph-23 journal: Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) Mar 21 10:07:54 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f99f4a25b92] Mar 21 10:07:54 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac] Mar 21 10:07:54 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c] Mar 21 10:07:54 ceph-23 journal: 4: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83] Mar 21 10:07:54 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) [0x561bd6422656] Mar 21 10:07:54 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c] Mar 21 10:07:54 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4] Mar 21 10:07:54 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95] Mar 21 10:07:54 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca] Mar 21 10:07:54 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3] Mar 21 10:07:54 ceph-23 journal: Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) Mar 21 10:07:54 ceph-23 journal: 1: (()+0x12ce0) [0x7f99f3605ce0] Mar 21 10:07:54 ceph-23 journal: 2: (gsignal()+0x10f) [0x7f99f2062a9f] Mar 21 10:07:54 ceph-23 journal: 3: (abort()+0x127) [0x7f99f2035e05] Mar 21 10:07:54 ceph-23 journal: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f99f4a25be3] Mar 21 10:07:54 ceph-23 journal: 5: (()+0x27ddac) [0x7f99f4a25dac] Mar 21 10:07:54 ceph-23 journal: 6: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c] Mar 21 10:07:54 ceph-23 journal: 7: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83] Mar 21 10:07:54 ceph-23 journal: 8: (MDSContext::complete(int)+0x56) [0x561bd6422656] Mar 21 10:07:54 ceph-23 journal: 9: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c] Mar 21 10:07:54 ceph-23 journal: 10: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4] Mar 21 10:07:54 ceph-23 journal: 11: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95] Mar 21 10:07:54 ceph-23 journal: 12: (()+0x81ca) [0x7f99f35fb1ca] Mar 21 10:07:54 ceph-23 journal: 13: (clone()+0x43) [0x7f99f204ddd3] Mar 21 10:07:54 ceph-23 journal: 2023-03-21T10:07:54.982+0100 7f99e63d5700 -1 *** Caught signal (Aborted) ** Mar 21 10:07:54 ceph-23 journal: in thread 7f99e63d5700 thread_name:MR_Finisher Mar 21 10:07:54 ceph-23 journal: Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) Mar 21 10:07:54 ceph-23 journal: 1: (()+0x12ce0) [0x7f99f3605ce0] Mar 21 10:07:54 ceph-23 journal: 2: (gsignal()+0x10f) [0x7f99f2062a9f] Mar 21 10:07:54 ceph-23 journal: 3: (abort()+0x127) [0x7f99f2035e05] Mar 21 10:07:54 ceph-23 journal: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f99f4a25be3] Mar 21 10:07:54 ceph-23 journal: 5: (()+0x27ddac) [0x7f99f4a25dac] Mar 21 10:07:54 ceph-23 journal: 6: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c] Mar 21 10:07:54 ceph-23 journal: 7: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83] Mar 21 10:07:54 ceph-23 journal: 8: (MDSContext::complete(int)+0x56) [0x561bd6422656] Mar 21 10:07:54 ceph-23 journal: 9: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c] Mar 21 10:07:54 ceph-23 journal: 10: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4] Mar 21 10:07:54 ceph-23 journal: 11: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95] Mar 21 10:07:54 ceph-23 journal: 12: (()+0x81ca) [0x7f99f35fb1ca] Mar 21 10:07:54 ceph-23 journal: 13: (clone()+0x43) [0x7f99f204ddd3] Mar 21 10:07:54 ceph-23 journal: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Mar 21 10:07:54 ceph-23 journal: Mar 21 10:07:55 ceph-23 journal: -1> 2023-03-21T10:07:54.980+0100 7f99e63d5700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f99e63d5700 time 2023-03-21T10:07:54.967936+0100 Mar 21 10:07:55 ceph-23 journal: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE) Mar 21 10:07:55 ceph-23 journal: Mar 21 10:07:55 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) Mar 21 10:07:55 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f99f4a25b92] Mar 21 10:07:55 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac] Mar 21 10:07:55 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c] Mar 21 10:07:55 ceph-23 journal: 4: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83] Mar 21 10:07:55 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) [0x561bd6422656] Mar 21 10:07:55 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c] Mar 21 10:07:55 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4] Mar 21 10:07:55 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95] Mar 21 10:07:55 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca] Mar 21 10:07:55 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3] Mar 21 10:07:55 ceph-23 journal: Mar 21 10:07:55 ceph-23 journal: 0> 2023-03-21T10:07:54.982+0100 7f99e63d5700 -1 *** Caught signal (Aborted) ** Mar 21 10:07:55 ceph-23 journal: in thread 7f99e63d5700 thread_name:MR_Finisher Mar 21 10:07:55 ceph-23 journal: Mar 21 10:07:55 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) Mar 21 10:07:55 ceph-23 journal: 1: (()+0x12ce0) [0x7f99f3605ce0] Mar 21 10:07:55 ceph-23 journal: 2: (gsignal()+0x10f) [0x7f99f2062a9f] Mar 21 10:07:55 ceph-23 journal: 3: (abort()+0x127) [0x7f99f2035e05] Mar 21 10:07:55 ceph-23 journal: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f99f4a25be3] Mar 21 10:07:55 ceph-23 journal: 5: (()+0x27ddac) [0x7f99f4a25dac] Mar 21 10:07:55 ceph-23 journal: 6: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c] Mar 21 10:07:55 ceph-23 journal: 7: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83] Mar 21 10:07:55 ceph-23 journal: 8: (MDSContext::complete(int)+0x56) [0x561bd6422656] Mar 21 10:07:55 ceph-23 journal: 9: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c] Mar 21 10:07:55 ceph-23 journal: 10: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4] Mar 21 10:07:55 ceph-23 journal: 11: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95] Mar 21 10:07:55 ceph-23 journal: 12: (()+0x81ca) [0x7f99f35fb1ca] Mar 21 10:07:55 ceph-23 journal: 13: (clone()+0x43) [0x7f99f204ddd3] Mar 21 10:07:55 ceph-23 journal: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Mar 21 10:07:55 ceph-23 journal: Mar 21 10:07:55 ceph-23 journal: -9999> 2023-03-21T10:07:54.980+0100 7f99e63d5700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f99e63d5700 time 2023-03-21T10:07:54.967936+0100 Mar 21 10:07:55 ceph-23 journal: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE) Mar 21 10:07:55 ceph-23 journal: Mar 21 10:07:55 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) Mar 21 10:07:55 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f99f4a25b92] Mar 21 10:07:55 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac] Mar 21 10:07:55 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c] Mar 21 10:07:55 ceph-23 journal: 4: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83] Mar 21 10:07:55 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) [0x561bd6422656] Mar 21 10:07:55 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c] Mar 21 10:07:55 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4] Mar 21 10:07:55 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95] Mar 21 10:07:55 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca] Mar 21 10:07:55 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3] Mar 21 10:07:55 ceph-23 journal: Mar 21 10:07:55 ceph-23 journal: -9998> 2023-03-21T10:07:54.982+0100 7f99e63d5700 -1 *** Caught signal (Aborted) ** Mar 21 10:07:55 ceph-23 journal: in thread 7f99e63d5700 thread_name:MR_Finisher Mar 21 10:07:55 ceph-23 journal: Mar 21 10:07:55 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) Mar 21 10:07:55 ceph-23 journal: 1: (()+0x12ce0) [0x7f99f3605ce0] Mar 21 10:07:55 ceph-23 journal: 2: (gsignal()+0x10f) [0x7f99f2062a9f] Mar 21 10:07:55 ceph-23 journal: 3: (abort()+0x127) [0x7f99f2035e05] Mar 21 10:07:55 ceph-23 journal: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f99f4a25be3] Mar 21 10:07:55 ceph-23 journal: 5: (()+0x27ddac) [0x7f99f4a25dac] Mar 21 10:07:55 ceph-23 journal: 6: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c] Mar 21 10:07:55 ceph-23 journal: 7: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83] Mar 21 10:07:55 ceph-23 journal: 8: (MDSContext::complete(int)+0x56) [0x561bd6422656] Mar 21 10:07:55 ceph-23 journal: 9: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c] Mar 21 10:07:55 ceph-23 journal: 10: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4] Mar 21 10:07:55 ceph-23 journal: 11: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95] Mar 21 10:07:55 ceph-23 journal: 12: (()+0x81ca) [0x7f99f35fb1ca] Mar 21 10:07:55 ceph-23 journal: 13: (clone()+0x43) [0x7f99f204ddd3] Mar 21 10:07:55 ceph-23 journal: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Mar 21 10:07:55 ceph-23 journal: Mar 21 10:07:55 ceph-23 journal: reraise_fatal: default handler for signal 6 didn't terminate the process? Mar 21 10:07:58 ceph-23 dockerd-current: time="2023-03-21T10:07:58.119559277+01:00" level=warning msg="040c1e98a0669204e0e98bdbcdde893f8acf63444f3827358e663a13a2869478 cleanup: failed to unmount secrets: invalid argument" Mar 21 10:07:58 ceph-23 kernel: overlayfs: upperdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior. Mar 21 10:07:58 ceph-23 kernel: overlayfs: workdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior. Mar 21 10:07:58 ceph-23 journal: 118 get_config /opt/ceph-container/bin/config.static.sh Mar 21 10:07:58 ceph-23 journal: 5 start_mds /opt/ceph-container/bin/start_mds.sh Mar 21 10:07:58 ceph-23 journal: 120 main /opt/ceph-container/bin/entrypoint.sh Mar 21 10:07:58 ceph-23 journal: 2023-03-21 10:07:58 /opt/ceph-container/bin/entrypoint.sh: static: does not generate config Mar 21 10:07:58 ceph-23 journal: 58 start_mds /opt/ceph-container/bin/start_mds.sh Mar 21 10:07:58 ceph-23 journal: 120 main /opt/ceph-container/bin/entrypoint.sh Mar 21 10:07:58 ceph-23 journal: 2023-03-21 10:07:58 /opt/ceph-container/bin/entrypoint.sh: SUCCESS Mar 21 10:07:58 ceph-23 journal: starting mds.ceph-23 at ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx