wal/db are on Intel S4610 960GB SSDs, with PLP and write back on huxiaoyu@xxxxxxxxxxxx From: YiteGu Date: 2021-11-26 11:32 To: huxiaoyu@xxxxxxxxxxxx; ceph-users Subject: Re: Rocksdb: Corruption: missing start of fragmented record(1) It look like your wal/db device loss data. please check your wal/db device whether have writeback cache, and power loss cause data loss. replay log failure when rocksdb restart. YiteGu ess_gyt@xxxxxx ------------------ Original ------------------ From: "huxiaoyu@xxxxxxxxxxxx" <huxiaoyu@xxxxxxxxxxxx>; Date: Fri, Nov 26, 2021 06:02 PM To: "ceph-users"<ceph-users@xxxxxxx>; Subject: Rocksdb: Corruption: missing start of fragmented record(1) Dear Cephers, I just had one Ceph osd node (Luminous 12.2.13) power-loss unexpectedly, and after restarting that node, two OSDs out of 10 can not be started, issuing the following errors (see below image), in particular, i see Rocksdb: Corruption: missing start of fragmented record(1) Bluestore(/var/lib/ceph/osd/osd-21) _open_db erroring opening db: ... **ERROR: OSD init failed: (5) Input/output error I checked the db/val SSDs, and they are working fine. So I am wondering the following 1) Is there a method to restore the OSDs? 2) what could be the potential causes of the corrupted db/wal? The db/wal SSDs have PLP and not been damaged during the power loss Your help would be highly appreciated. best regards, samuel huxiaoyu@xxxxxxxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx