Hi Frank,
This should be the same issue with
https://tracker.ceph.com/issues/49132, which has been fixed.
Thanks
- Xiubo
On 21/03/2023 23:32, Frank Schilder wrote:
Hi all,
we have an octopus v15.2.17 cluster and observe that one of our MDS hosts showed up in the OSD blacklist:
# ceph osd blacklist ls
192.168.32.87:6801/3841823949 2023-03-22T10:08:02.589698+0100
192.168.32.87:6800/3841823949 2023-03-22T10:08:02.589698+0100
I see an MDS restart that might be related; see log snippets below. There are no clients running on this host, only OSDs and one MDS. What could be the reason for the blacklist entries?
Thanks!
Log snippets:
Mar 21 10:07:54 ceph-23 journal: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f99e63d5700 time 2023-03-21T10:07:54.967936+0100
Mar 21 10:07:54 ceph-23 journal: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)
Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:54 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f99f4a25b92]
Mar 21 10:07:54 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:54 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:54 ceph-23 journal: 4: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:54 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) [0x561bd6422656]
Mar 21 10:07:54 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c]
Mar 21 10:07:54 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4]
Mar 21 10:07:54 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95]
Mar 21 10:07:54 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:54 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:54 ceph-23 journal: *** Caught signal (Aborted) **
Mar 21 10:07:54 ceph-23 journal: in thread 7f99e63d5700 thread_name:MR_Finisher
Mar 21 10:07:54 ceph-23 journal: 2023-03-21T10:07:54.980+0100 7f99e63d5700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f99e63d5700 time 2023-03-21T10:07:54.967936+0100
Mar 21 10:07:54 ceph-23 journal: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)
Mar 21 10:07:54 ceph-23 journal:
Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:54 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f99f4a25b92]
Mar 21 10:07:54 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:54 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:54 ceph-23 journal: 4: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:54 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) [0x561bd6422656]
Mar 21 10:07:54 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c]
Mar 21 10:07:54 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4]
Mar 21 10:07:54 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95]
Mar 21 10:07:54 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:54 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:54 ceph-23 journal:
Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:54 ceph-23 journal: 1: (()+0x12ce0) [0x7f99f3605ce0]
Mar 21 10:07:54 ceph-23 journal: 2: (gsignal()+0x10f) [0x7f99f2062a9f]
Mar 21 10:07:54 ceph-23 journal: 3: (abort()+0x127) [0x7f99f2035e05]
Mar 21 10:07:54 ceph-23 journal: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f99f4a25be3]
Mar 21 10:07:54 ceph-23 journal: 5: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:54 ceph-23 journal: 6: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:54 ceph-23 journal: 7: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:54 ceph-23 journal: 8: (MDSContext::complete(int)+0x56) [0x561bd6422656]
Mar 21 10:07:54 ceph-23 journal: 9: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c]
Mar 21 10:07:54 ceph-23 journal: 10: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4]
Mar 21 10:07:54 ceph-23 journal: 11: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95]
Mar 21 10:07:54 ceph-23 journal: 12: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:54 ceph-23 journal: 13: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:54 ceph-23 journal: 2023-03-21T10:07:54.982+0100 7f99e63d5700 -1 *** Caught signal (Aborted) **
Mar 21 10:07:54 ceph-23 journal: in thread 7f99e63d5700 thread_name:MR_Finisher
Mar 21 10:07:54 ceph-23 journal:
Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:54 ceph-23 journal: 1: (()+0x12ce0) [0x7f99f3605ce0]
Mar 21 10:07:54 ceph-23 journal: 2: (gsignal()+0x10f) [0x7f99f2062a9f]
Mar 21 10:07:54 ceph-23 journal: 3: (abort()+0x127) [0x7f99f2035e05]
Mar 21 10:07:54 ceph-23 journal: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f99f4a25be3]
Mar 21 10:07:54 ceph-23 journal: 5: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:54 ceph-23 journal: 6: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:54 ceph-23 journal: 7: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:54 ceph-23 journal: 8: (MDSContext::complete(int)+0x56) [0x561bd6422656]
Mar 21 10:07:54 ceph-23 journal: 9: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c]
Mar 21 10:07:54 ceph-23 journal: 10: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4]
Mar 21 10:07:54 ceph-23 journal: 11: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95]
Mar 21 10:07:54 ceph-23 journal: 12: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:54 ceph-23 journal: 13: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:54 ceph-23 journal: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mar 21 10:07:54 ceph-23 journal:
Mar 21 10:07:55 ceph-23 journal: -1> 2023-03-21T10:07:54.980+0100 7f99e63d5700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f99e63d5700 time 2023-03-21T10:07:54.967936+0100
Mar 21 10:07:55 ceph-23 journal: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)
Mar 21 10:07:55 ceph-23 journal:
Mar 21 10:07:55 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:55 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f99f4a25b92]
Mar 21 10:07:55 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:55 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:55 ceph-23 journal: 4: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:55 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) [0x561bd6422656]
Mar 21 10:07:55 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c]
Mar 21 10:07:55 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4]
Mar 21 10:07:55 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95]
Mar 21 10:07:55 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:55 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:55 ceph-23 journal:
Mar 21 10:07:55 ceph-23 journal: 0> 2023-03-21T10:07:54.982+0100 7f99e63d5700 -1 *** Caught signal (Aborted) **
Mar 21 10:07:55 ceph-23 journal: in thread 7f99e63d5700 thread_name:MR_Finisher
Mar 21 10:07:55 ceph-23 journal:
Mar 21 10:07:55 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:55 ceph-23 journal: 1: (()+0x12ce0) [0x7f99f3605ce0]
Mar 21 10:07:55 ceph-23 journal: 2: (gsignal()+0x10f) [0x7f99f2062a9f]
Mar 21 10:07:55 ceph-23 journal: 3: (abort()+0x127) [0x7f99f2035e05]
Mar 21 10:07:55 ceph-23 journal: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f99f4a25be3]
Mar 21 10:07:55 ceph-23 journal: 5: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:55 ceph-23 journal: 6: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:55 ceph-23 journal: 7: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:55 ceph-23 journal: 8: (MDSContext::complete(int)+0x56) [0x561bd6422656]
Mar 21 10:07:55 ceph-23 journal: 9: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c]
Mar 21 10:07:55 ceph-23 journal: 10: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4]
Mar 21 10:07:55 ceph-23 journal: 11: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95]
Mar 21 10:07:55 ceph-23 journal: 12: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:55 ceph-23 journal: 13: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:55 ceph-23 journal: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mar 21 10:07:55 ceph-23 journal:
Mar 21 10:07:55 ceph-23 journal: -9999> 2023-03-21T10:07:54.980+0100 7f99e63d5700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f99e63d5700 time 2023-03-21T10:07:54.967936+0100
Mar 21 10:07:55 ceph-23 journal: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)
Mar 21 10:07:55 ceph-23 journal:
Mar 21 10:07:55 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:55 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f99f4a25b92]
Mar 21 10:07:55 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:55 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:55 ceph-23 journal: 4: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:55 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) [0x561bd6422656]
Mar 21 10:07:55 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c]
Mar 21 10:07:55 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4]
Mar 21 10:07:55 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95]
Mar 21 10:07:55 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:55 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:55 ceph-23 journal:
Mar 21 10:07:55 ceph-23 journal: -9998> 2023-03-21T10:07:54.982+0100 7f99e63d5700 -1 *** Caught signal (Aborted) **
Mar 21 10:07:55 ceph-23 journal: in thread 7f99e63d5700 thread_name:MR_Finisher
Mar 21 10:07:55 ceph-23 journal:
Mar 21 10:07:55 ceph-23 journal: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:55 ceph-23 journal: 1: (()+0x12ce0) [0x7f99f3605ce0]
Mar 21 10:07:55 ceph-23 journal: 2: (gsignal()+0x10f) [0x7f99f2062a9f]
Mar 21 10:07:55 ceph-23 journal: 3: (abort()+0x127) [0x7f99f2035e05]
Mar 21 10:07:55 ceph-23 journal: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f99f4a25be3]
Mar 21 10:07:55 ceph-23 journal: 5: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:55 ceph-23 journal: 6: (MDCache::truncate_inode(CInode*, LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:55 ceph-23 journal: 7: (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:55 ceph-23 journal: 8: (MDSContext::complete(int)+0x56) [0x561bd6422656]
Mar 21 10:07:55 ceph-23 journal: 9: (MDSIOContextBase::complete(int)+0x39c) [0x561bd6422b5c]
Mar 21 10:07:55 ceph-23 journal: 10: (MDSLogContextBase::complete(int)+0x44) [0x561bd6422cb4]
Mar 21 10:07:55 ceph-23 journal: 11: (Finisher::finisher_thread_entry()+0x1a5) [0x7f99f4ab6a95]
Mar 21 10:07:55 ceph-23 journal: 12: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:55 ceph-23 journal: 13: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:55 ceph-23 journal: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mar 21 10:07:55 ceph-23 journal:
Mar 21 10:07:55 ceph-23 journal: reraise_fatal: default handler for signal 6 didn't terminate the process?
Mar 21 10:07:58 ceph-23 dockerd-current: time="2023-03-21T10:07:58.119559277+01:00" level=warning msg="040c1e98a0669204e0e98bdbcdde893f8acf63444f3827358e663a13a2869478 cleanup: failed to unmount secrets: invalid argument"
Mar 21 10:07:58 ceph-23 kernel: overlayfs: upperdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.
Mar 21 10:07:58 ceph-23 kernel: overlayfs: workdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.
Mar 21 10:07:58 ceph-23 journal: 118 get_config /opt/ceph-container/bin/config.static.sh
Mar 21 10:07:58 ceph-23 journal: 5 start_mds /opt/ceph-container/bin/start_mds.sh
Mar 21 10:07:58 ceph-23 journal: 120 main /opt/ceph-container/bin/entrypoint.sh
Mar 21 10:07:58 ceph-23 journal: 2023-03-21 10:07:58 /opt/ceph-container/bin/entrypoint.sh: static: does not generate config
Mar 21 10:07:58 ceph-23 journal: 58 start_mds /opt/ceph-container/bin/start_mds.sh
Mar 21 10:07:58 ceph-23 journal: 120 main /opt/ceph-container/bin/entrypoint.sh
Mar 21 10:07:58 ceph-23 journal: 2023-03-21 10:07:58 /opt/ceph-container/bin/entrypoint.sh: SUCCESS
Mar 21 10:07:58 ceph-23 journal: starting mds.ceph-23 at
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Best Regards,
Xiubo Li (李秀波)
Email: xiubli@xxxxxxxxxx/xiubli@xxxxxxx
Slack: @Xiubo Li
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx