Hi Frank, That's true from the performance perspective, however it is not unsafe to leave the cache enabled -- ceph uses fsync appropriately to make the writes durable. This issue looks rather to be related to concurrent hardware failure. Cheers, Dan On Mon, Nov 29, 2021 at 9:21 AM Frank Schilder <frans@xxxxxx> wrote: > > This may sound counter-intuitive, but you need to disable write cache to enable PLP cache only. SSDs with PLP have usually 2 types of cache, volatile and non-volatile. The volatile cache will experience data loss on power loss. It is the volatile cache that gets disabled when issuing the hd-/sdparm/smartctl command to switch it off. In many cases this can increase the non-volatile cache and also performance. > > It is the non-volatile cache you want your writes to go to directly. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: huxiaoyu@xxxxxxxxxxxx <huxiaoyu@xxxxxxxxxxxx> > Sent: 26 November 2021 22:41:10 > To: YiteGu; ceph-users > Subject: Re: Rocksdb: Corruption: missing start of fragmented record(1) > > wal/db are on Intel S4610 960GB SSDs, with PLP and write back on > > > > huxiaoyu@xxxxxxxxxxxx > > From: YiteGu > Date: 2021-11-26 11:32 > To: huxiaoyu@xxxxxxxxxxxx; ceph-users > Subject: Re: Rocksdb: Corruption: missing start of fragmented record(1) > It look like your wal/db device loss data. > please check your wal/db device whether have writeback cache, and power loss cause data loss. replay log failure when rocksdb restart. > > > > YiteGu > ess_gyt@xxxxxx > > > > ------------------ Original ------------------ > From: "huxiaoyu@xxxxxxxxxxxx" <huxiaoyu@xxxxxxxxxxxx>; > Date: Fri, Nov 26, 2021 06:02 PM > To: "ceph-users"<ceph-users@xxxxxxx>; > Subject: Rocksdb: Corruption: missing start of fragmented record(1) > > Dear Cephers, > > I just had one Ceph osd node (Luminous 12.2.13) power-loss unexpectedly, and after restarting that node, two OSDs out of 10 can not be started, issuing the following errors (see below image), in particular, i see > > Rocksdb: Corruption: missing start of fragmented record(1) > Bluestore(/var/lib/ceph/osd/osd-21) _open_db erroring opening db: > ... > **ERROR: OSD init failed: (5) Input/output error > > I checked the db/val SSDs, and they are working fine. So I am wondering the following > 1) Is there a method to restore the OSDs? > 2) what could be the potential causes of the corrupted db/wal? The db/wal SSDs have PLP and not been damaged during the power loss > > Your help would be highly appreciated. > > best regards, > > samuel > > > > > huxiaoyu@xxxxxxxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx