Hi all, > I did a quick test with wcache off[1]. And have the impression the > simple rados bench of 2 minutes performed a bit worse on my slow hdd's. This probably depends on whether or not the drive actually has non-volatile write cache. I noticed that from many vendors you can buy the seemingly exact same drive for a difference of something like 20$. My best bet is, that the slightly more expensive ones have functioning power loss protection hardware that passed the quality test and is disabled in the cheaper drives (probably among other things). Going for the cheapest version all the time can have its price. For the disks we are using, my impression is that disabling volatile write cache actually adds the volatile cache capacity to the non-volatile write cache. The disks start consuming more power, but also perform better with ceph. For our HDDs I have never seen a degradation, fortunately - or one could say that maybe they are so crappy that it couldn't get any worse :). In case our vendor reads this, this was a practical joke :) The main question here is, do you want to risk data loss on power loss? Ceph is extremely sensitive to data that was acknowledged as "on disk" by the firmware to disappear after power outage. This is different to journaled file systems like ext4, which manage to roll back to an earlier consistent version. One looses data but the fs is not damaged. Xfs has still problems with that though. With ceph you can loose entire pools without a viable recovery option as was described earlier in this thread. > Couldn't we just set (uncomment) > write_cache = off > in /etc/hdparm.conf? I was pondering with that. The problem is, that on Centos systems it seems to be ignored, in general it does not apply to SAS drives, for example, and that it has no working way of configuring which drives to exclude. For example, while for data disks for ceph we have certain minimum requirements, like functioning power loss protection, for an OS boot drive I really don't care. Power outages on cheap drives that loose writes has not been a problem since ext4. A few log entries or contents of swap - who cares. Here, performance is more important than data security on power loss. I would require a configurable option that works in the same way for all types of protocols, SATA, SAS, NVMe disks, you name it. At time of writing, I don't know of any. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> Sent: 25 June 2020 00:01:51 To: paul.emmerich; vitalif Cc: bknecht; ceph-users; s.priebe Subject: Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs I did a quick test with wcache off[1]. And have the impression the simple rados bench of 2 minutes performed a bit worse on my slow hdd's. [1] IFS=$'\n' && for line in `mount | grep 'osd/ceph'| awk '{print $1" "$3}'| sed -e 's/1 / /' -e 's#/var/lib/ceph/osd/ceph-##'`;do IFS=' ' arr=($line); service ceph-osd@${arr[1]} stop && smartctl -s wcache,off ${arr[0]} && service ceph-osd@${arr[1]} start ;done -----Original Message----- To: Paul Emmerich Cc: Benoît Knecht; s.priebe@xxxxxxxxxxxx; ceph-users@xxxxxxx Subject: Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs Hi, https://yourcmc.ru/wiki/Ceph_performance author here %) Disabling write cache is REALLY bad for SSDs without capacitors [consumer SSDs], also it's bad for HDDs with firmwares that don't have this bug-o-feature. The bug is really common though. I have no idea where it comes from, but it's really common. When you "disable" the write cache you actually "enable" the non-volatile write cache on those drives. Seagate EXOS drives also behave like that... It seems most EXOS drives have an SSD cache even though it's not mentioned in specs. And it gets enabled when you do hdparm -W 0. In theory hdparm -W 0 may hurt linear write performance even on those HDDs, though. > Well, what I was saying was "does it hurt to unconditionally run > hdparm -W 0 on all disks?" > > Which disk would suffer from this? I haven't seen any disk where this > would be a bad idea > > Paul _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx