I have a cluster that I increased the the number of PGs on because the autoscaler wasn't working as expected. It's recovering the misplaced objects, but a OSD just failed, and refuses to come back up. The device is readable to the OS, and there are 2 other OSDs on the same node that are online. I looked online, but haven't found anything relevant. This is the end of the OSD log: -3> 2023-03-30T21:21:19.641+0000 7fcb026413c0 1 bluefs mount -2> 2023-03-30T21:21:19.641+0000 7fcb026413c0 1 bluefs _init_alloc shared, id 1, capacity 0x4affc00000, block size 0x10000 -1> 2023-03-30T21:21:19.673+0000 7fcb026413c0 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.3/rpm/el8/BUILD/ceph-17.2.3/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_replay(bool, bool)' thread 7fcb026413c0 time 2023-03-30T21:21:19.665811+0000 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.3/rpm/el8/BUILD/ceph-17.2.3/src/os/bluestore/BlueFS.cc: 1419: FAILED ceph_assert(r == q->second->file_map.end()) ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x56525ddd8954] 2: /usr/bin/ceph-osd(+0x5d8b75) [0x56525ddd8b75] 3: (BlueFS::_replay(bool, bool)+0x599c) [0x56525e5590ec] 4: (BlueFS::mount()+0x120) [0x56525e559530] 5: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x56525e4160b4] 6: (BlueStore::_prepare_db_environment(bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0x6e1) [0x56525e417211] 7: (BlueStore::_open_db(bool, bool, bool)+0x159) [0x56525e449f69] 8: (BlueStore::_open_db_and_around(bool, bool)+0x2b4) [0x56525e493f14] 9: (BlueStore::_mount()+0x1ae) [0x56525e4970fe] 10: (OSD::init()+0x403) [0x56525df16f23] 11: main() 12: __libc_start_main() 13: _start() 0> 2023-03-30T21:21:19.681+0000 7fcb026413c0 -1 *** Caught signal (Aborted) ** in thread 7fcb026413c0 thread_name:ceph-osd ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable) 1: /lib64/libpthread.so.0(+0x12ce0) [0x7fcb00844ce0] 2: gsignal() 3: abort() 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x56525ddd89b2] 5: /usr/bin/ceph-osd(+0x5d8b75) [0x56525ddd8b75] 6: (BlueFS::_replay(bool, bool)+0x599c) [0x56525e5590ec] 7: (BlueFS::mount()+0x120) [0x56525e559530] 8: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x56525e4160b4] 9: (BlueStore::_prepare_db_environment(bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0x6e1) [0x56525e417211] 10: (BlueStore::_open_db(bool, bool, bool)+0x159) [0x56525e449f69] 11: (BlueStore::_open_db_and_around(bool, bool)+0x2b4) [0x56525e493f14] 12: (BlueStore::_mount()+0x1ae) [0x56525e4970fe] 13: (OSD::init()+0x403) [0x56525df16f23] 14: main() 15: __libc_start_main() 16: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 rbd_pwl 0/ 5 journaler 0/ 5 objectcacher 0/ 5 immutable_obj_cache 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/ 5 rgw_datacache 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 fuse 2/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace 1/ 5 prioritycache 0/ 5 test 0/ 5 cephfs_mirror 0/ 5 cephsqlite 0/ 5 seastore 0/ 5 seastore_onode 0/ 5 seastore_odata 0/ 5 seastore_omap 0/ 5 seastore_tm 0/ 5 seastore_cleaner 0/ 5 seastore_lba 0/ 5 seastore_cache 0/ 5 seastore_journal 0/ 5 seastore_device 0/ 5 alienstore 1/ 5 mclock -2/-2 (syslog threshold) 99/99 (stderr threshold) --- pthread ID / name mapping for recent threads --- 7fcafb03b700 / admin_socket 7fcafb83c700 / msgr-worker-2 7fcb026413c0 / ceph-osd max_recent 10000 max_new 10000 log_file /var/log/ceph/ceph-osd.5.log --- end dump of recent events --- I'd like to recover this OSD if possible. Does anyone have any suggestions? _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx