Hello all, I had an OSD go offline due to UWE. When restarting the OSD service, to try and at least get it to drain cleanly of that data that wasn't damaged, the ceph-osd process would crash. I then attempted to repair it using ceph-bluestore-tool. I can run fsck and it will complete without issue, however when attempting to run repair it crashes in the exact same way that ceph-osd crashes. I'll attach the tail end of the output here: 2023-12-17T20:24:53.320+1000 7fdb7bf17740 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: stored = 1106056583, computed = 657190205, type = 1 in db/020524.sst offset 21626321 size 4014 code = Rocksdb transaction: PutCF( prefix = S key = 'per_pool_omap' value size = 1) -442> 2023-12-17T20:24:53.386+1000 7fdb7bf17740 -1 /usr/src/debug/ceph/ceph-18.2.0/src/os/bluestore/BlueStore.cc: In function 'unsigned int BlueStoreRepairer::apply(KeyValueDB*)' thread 7fdb7bf17740 time 2023-12-17T20:24:53.341999+1000 /usr/src/debug/ceph/ceph-18.2.0/src/os/bluestore/BlueStore.cc: 17982: FAILED ceph_assert(ok) ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x136) [0x7fdb7b6502c9] 2: /usr/lib/ceph/libceph-common.so.2(+0x2504a4) [0x7fdb7b6504a4] 3: (BlueStoreRepairer::apply(KeyValueDB*)+0x5af) [0x559afb98cc7f] 4: (BlueStore::_fsck_on_open(BlueStore::FSCKDepth, bool)+0x45fc) [0x559afba2436c] 5: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x204) [0x559afba31014] 6: main() 7: /usr/lib/libc.so.6(+0x27cd0) [0x7fdb7ae45cd0] 8: __libc_start_main() 9: _start() -441> 2023-12-17T20:24:53.390+1000 7fdb7bf17740 -1 *** Caught signal (Aborted) ** in thread 7fdb7bf17740 thread_name:ceph-bluestore- ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable) 1: /usr/lib/libc.so.6(+0x3e710) [0x7fdb7ae5c710] 2: /usr/lib/libc.so.6(+0x8e83c) [0x7fdb7aeac83c] 3: raise() 4: abort() 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x191) [0x7fdb7b650324] 6: /usr/lib/ceph/libceph-common.so.2(+0x2504a4) [0x7fdb7b6504a4] 7: (BlueStoreRepairer::apply(KeyValueDB*)+0x5af) [0x559afb98cc7f] 8: (BlueStore::_fsck_on_open(BlueStore::FSCKDepth, bool)+0x45fc) [0x559afba2436c] 9: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x204) [0x559afba31014] 10: main() 11: /usr/lib/libc.so.6(+0x27cd0) [0x7fdb7ae45cd0] 12: __libc_start_main() 13: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. The reason I need to get this OSD functioning is I had two other OSD's fail causing a single PG to be in down state. The weird thing is, I got one of those back up without issue (ceph-osd crashed due to root filling and alert not sending) but the PG is still down. So I need to get this other one back up (or the data extracted) to get that PG back from down. Thanks in advance _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx