Due to the ongoing South African energy crisis <https://en.wikipedia.org/wiki/South_African_energy_crisis> our datacenter experienced sudden power loss. We are running ceph 17.2.5 deployed with cephadm. Two of our OSDs did not start correctly, with the error: # ceph-bluestore-tool fsck --path /var/lib/ceph/ed7b2c16-b053-45e2-a1fe-bf3474f90508/osd.27/ 2023-01-15T08:38:04.289+0200 7f2a2a03c540 -1 bluestore::NCB::__restore_allocator::No Valid allocation info on disk (empty file) /build/ceph-17.2.5/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)' thread 7f2a2a03c540 time 2023-01-15T08:39:31.304968+0200 /build/ceph-17.2.5/src/os/bluestore/BlueStore.cc: 18968: FAILED ceph_assert(collection_ref) 2023-01-15T08:39:31.298+0200 7f2a2a03c540 -1 bluestore::NCB::read_allocation_from_onodes::stray object 2#55:ffffffff:::2000055f327.00002287:head# not owned by any collection ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x7f2a2acc07c6] 2: /usr/lib/ceph/libceph-common.so.2(+0x27c9d8) [0x7f2a2acc09d8] 3: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0xa24) [0x560d6baf5754] 4: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x5f) [0x560d6baf66ff] 5: (BlueStore::read_allocation_from_drive_on_startup()+0x99) [0x560d6baf68b9] 6: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0xaca) [0x560d6bb0c15a] 7: (BlueStore::_open_db_and_around(bool, bool)+0x35c) [0x560d6bb380dc] 8: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x250) [0x560d6bb3a8c0] 9: main() 10: __libc_start_main() 11: _start() *** Caught signal (Aborted) ** in thread 7f2a2a03c540 thread_name:ceph-bluestore- 2023-01-15T08:39:31.306+0200 7f2a2a03c540 -1 /build/ceph-17.2.5/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)' thread 7f2a2a03c540 time 2023-01-15T08:39:31.304968+0200 /build/ceph-17.2.5/src/os/bluestore/BlueStore.cc: 18968: FAILED ceph_assert(collection_ref) (complete log https://gist.github.com/pvanheus/5c57455cacdc91afc9ce27fd489cae25) Is there a way to recover from this? Or should I accept the OSDs as lost and rebuild them? Thanks, Peter _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx