Re: Disk write cache - safe?

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Fri, 16 Mar 2018 11:20:02 +0100

Hi Tim,

I wanted to share our experience here as we've been in a situation in 
the past (on a friday afternoon of course...) that injecting a snaptrim 
priority of 40 to all OSDs in the cluster (to speed up snaptimming) 
resulted in alls OSD nodes crashing at the same time, in all 3 
datacenters. My first thought at that particular moment was : call your 
wife and tell her you'll be late home. :-D

And this event was not related to a power outage.

Fortunately I had spent some time (when building the cluster) thinking 
how each option should be set along the I/O path for #1 data consistency 
and #2 best possible performance, and that was :

- Single SATA disks Raid0 with writeback PERC caching on each virtual disk
- write barriers kept enabled on XFS mounts (I had measured a 1.5 % 
performance gap so disabling warriers was no good choice, and is never 
actually)
- SATA disks write buffer disabled (as volatile)
- SSD journal disks write buffer enabled (as persistent)

We hardly believed it but when all nodes came back online, all OSDs 
rejoined the cluster and service was back as it was before. We didn't 
face any XFS errors nor did we have any further scrub or deep-scrub errors.

My assumption was that the extra power demand for snaptrimimng may have 
led to node power instability or that we hit a SATA firmware or maybe a 
kernel bug.

We also had SSDs as Raid0 with writeback PERC cache ON but changed that 
to write-through as we could get more IOPS from them regarding our 
workloads.

Thanks for sharing the information about DELL changing the default disk 
buffer policy. What's odd is that it all buffers were disabled after the 
node rebooted, including SSDs !
I am now changing them back to enabled for SSDs only.

As said by others, you'd better keep the disks buffers disabled and 
rebuild the OSDs after setting the disks as Raid0 with writeback enabled.

Best,

Frédéric.

Le 14/03/2018 à 20:42, Tim Bishop a écrit :
I'm using Ceph on Ubuntu 16.04 on Dell R730xd servers. A recent [1]
update to the PERC firmware disabled the disk write cache by default
which made a noticable difference to the latency on my disks (spinning
disks, not SSD) - by as much as a factor of 10.

For reference their change list says:

"Changes default value of drive cache for 6 Gbps SATA drive to disabled.
This is to align with the industry for SATA drives. This may result in a
performance degradation especially in non-Raid mode. You must perform an
AC reboot to see existing configurations change."

It's fairly straightforward to re-enable the cache either in the PERC
BIOS, or by using hdparm, and doing so returns the latency back to what
it was before.

Checking the Ceph documentation I can see that older versions [2]
recommended disabling the write cache for older kernels. But given I'm
using a newer kernel, and there's no mention of this in the Luminous
docs, is it safe to assume it's ok to enable the disk write cache now?

If it makes a difference, I'm using a mixture of filestore and bluestore
OSDs - migration is still ongoing.

Thanks,

Tim.

[1] - https://www.dell.com/support/home/uk/en/ukdhs1/Drivers/DriversDetails?driverId=8WK8N
[2] - http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com