Failing OSD RocksDB Corrupt

"Ashley Merrick" <ashley@xxxxxxxxxxxxxx> · Tue, 22 Dec 2020 15:29:05 +0000

Hello,I had some faulty power cables on some OSD's in one server which caused lots of IO issues/disks appearing/disappearing.This has been corrected now, 2 of the 10 OSD's are working, however 8 are failing to start due to what looks to be a corrupt DB.When running a ceph-bluestore-tool fsck I get the following output:rocksdb: [db/db_impl_open.cc:516] db.wal/002221.log: dropping 1302 bytes; Corruption: missing start of fragmented record(2)2020-12-22T16:21:52.715+0100 7f7b6a1500c0 4 rocksdb: [db/db_impl.cc:389] Shutdown: canceling all background work2020-12-22T16:21:52.715+0100 7f7b6a1500c0 4 rocksdb: [db/db_impl.cc:563] Shutdown complete2020-12-22T16:21:52.715+0100 7f7b6a1500c0 -1 rocksdb: Corruption: missing start of fragmented record(2)2020-12-22T16:21:52.715+0100 7f7b6a1500c0 -1 bluestore(/var/lib/ceph/b1db6b36-0c4c-4bce-9cda-18834be0632d/osd.28) opendb erroring opening db:Trying to start the OSD leads to:ceph_abort_msg("Bad table magic number: expected 9863518390377041911, found 9372993859750765257 in db/002442.sst")It looks like the last write to these OSD's never fully completed, sadly as I was adding this new node to move from OSD to Host redundancy (EC Pool) I have 20% down PG's currently, is there anything I can do to remove the last entry in the DB or somehow clean up the rocksDB to get these OSD's atleast started? Understand may end up with some corrupted files.Thanks

Sent via MXlogin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx