Hi, We have some docs about this in the Ceph hardware recommendations: https://docs.ceph.com/en/latest/start/hardware-recommendations/#write-caches I added some responses inline.. On Fri, Aug 5, 2022 at 7:23 PM Torbjörn Jansson <torbjorn@xxxxxxxxxxxx> wrote: > > Hello > > i got a small 3 node ceph cluster and i'm doing some bench marking related to > performance with drive write caching. > > the reason i started was because i wanted to test the SSDs i have for their > performance for use as db device for the osds and make sure they are setup as > good as i can get it. > > i read that turning off write cache can be beneficial even when it sounds > backwards. "write cache" is a volatile cache -- so when it is enabled, Linux knows that it is writing to a volatile area on the device and therefore it needs to issue flushes to persist data. Linux considers these devices to be in "write back" mode. When the write cache is disabled, then Linux knows it is writing to a persisted area, and therefore doesn't bother sending flushes anymore -- these devices are in "write through" mode. And btw, new data centre class devices have firmware and special hardware to accelerate those persisted writes when the volatile cache is disabled. This is the so-called media cache. > this seems to be true. > i used mainly fio and "iostat -x" to test using something like: > fio --filename=/dev/ceph-db-0/bench --direct=1 --sync=1 --rw=write --bs=4k > --numjobs=5 --iodepth=1 --runtime=60 --time_based --group_reporting > > and then testing this with write cache turned off and on to compare the results. > also with and without sync in fio command above. > > one thing i observed related to turning off the write cache on drives was that > it appears a reboot is needed for it to have any effect. This is depending on the OS -- if you set the cache using the approach mentioned in the docs above, then in all distros we tested it keeps WCE and "write through" consistent with each other. > and this is where it gets strange and the part i don't get. > > the disks i have, seagate nytro sas3 ssd, according to the drive manual the > drive don't care what you set the WCE bit to and it will do write caching > internally regardless. > most likely because it is an enterprise disk with built in power loss protection. > > BUT it makes a big difference to the performance and the flush per seconds in > iostat. > so it appears that if you boot and the drive got its write cache disabled right > from the start (dmesg contains stuff like: "sd 0:0:0:0: [sda] Write cache: > disabled") then linux wont send any flush to the drive and you get good > performance. > if you change the write caching on a drive during runtime (sdparm for sas or > hdparm for sata) then it wont change anything. Check the cache_type at e.g. /sys/class/scsi_disk/0\:0\:0\:0/cache_type "write back" -> flush is sent "write through" -> flush not sent > why is that? why do i have to do a reboot? > i mean, lets say you boot with write cache disabled, linux decides to never > send flush and you change it after boot to enable the cache, if there is no > flush then you risk your data in case of a power loss, or? On all devices we have, if we have "write through" at boot, then set (with hdparm or sdparm) WCE=1 or echo "write back" > ... then the cache_type is automatically set correctly to "write back" and flushes are sent. There is another /sys/ entry to toggle flush behaviour: echo "write through" > /sys/block/sda/queue/write_cache This is apparently a way to lie to the OS so it stops sending flushes (without manipulating the WCE mode of the underlying device). Cheers, Dan > this is not very obvious or good behavior i think (i hope i'm wrong and some > one can enlighten me) > > > for sas drives sdparm -s WCE=0 --save /dev/sdX appears to do the right thing > and it survives a reboot. > but for sata disks hdparm -W 0 -K 1 /dev/sdX makes the change but as long as > drive is connected to sas controller it still gets the write cache enabled at > boot so i bet sas controller also messes with the write cache setting on the > drives. > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx