Corrupt bluestore after sudden reboot (17.2.5)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Due to the ongoing South African energy crisis
<https://en.wikipedia.org/wiki/South_African_energy_crisis> our datacenter
experienced sudden power loss. We are running ceph 17.2.5 deployed with
cephadm. Two of our OSDs did not start correctly, with the error:

# ceph-bluestore-tool fsck --path
/var/lib/ceph/ed7b2c16-b053-45e2-a1fe-bf3474f90508/osd.27/
2023-01-15T08:38:04.289+0200 7f2a2a03c540 -1
bluestore::NCB::__restore_allocator::No Valid allocation info on disk
(empty file)
/build/ceph-17.2.5/src/os/bluestore/BlueStore.cc: In function 'int
BlueStore::read_allocation_from_onodes(SimpleBitmap*,
BlueStore::read_alloc_stats_t&)' thread 7f2a2a03c540 time
2023-01-15T08:39:31.304968+0200
/build/ceph-17.2.5/src/os/bluestore/BlueStore.cc: 18968: FAILED
ceph_assert(collection_ref)
2023-01-15T08:39:31.298+0200 7f2a2a03c540 -1
bluestore::NCB::read_allocation_from_onodes::stray object
2#55:ffffffff:::2000055f327.00002287:head# not owned by any collection
 ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14f) [0x7f2a2acc07c6]
 2: /usr/lib/ceph/libceph-common.so.2(+0x27c9d8) [0x7f2a2acc09d8]
 3: (BlueStore::read_allocation_from_onodes(SimpleBitmap*,
BlueStore::read_alloc_stats_t&)+0xa24) [0x560d6baf5754]
 4: (BlueStore::reconstruct_allocations(SimpleBitmap*,
BlueStore::read_alloc_stats_t&)+0x5f) [0x560d6baf66ff]
 5: (BlueStore::read_allocation_from_drive_on_startup()+0x99)
[0x560d6baf68b9]
 6: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long,
std::less<unsigned long>, std::allocator<std::pair<unsigned long const,
unsigned long> > >*)+0xaca) [0x560d6bb0c15a]
 7: (BlueStore::_open_db_and_around(bool, bool)+0x35c) [0x560d6bb380dc]
 8: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x250) [0x560d6bb3a8c0]
 9: main()
 10: __libc_start_main()
 11: _start()
*** Caught signal (Aborted) **
 in thread 7f2a2a03c540 thread_name:ceph-bluestore-
2023-01-15T08:39:31.306+0200 7f2a2a03c540 -1
/build/ceph-17.2.5/src/os/bluestore/BlueStore.cc: In function 'int
BlueStore::read_allocation_from_onodes(SimpleBitmap*,
BlueStore::read_alloc_stats_t&)' thread 7f2a2a03c540 time
2023-01-15T08:39:31.304968+0200
/build/ceph-17.2.5/src/os/bluestore/BlueStore.cc: 18968: FAILED
ceph_assert(collection_ref)

(complete log
https://gist.github.com/pvanheus/5c57455cacdc91afc9ce27fd489cae25)

Is there a way to recover from this? Or should I accept the OSDs as lost
and rebuild them?

Thanks,
Peter
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux