Re: Disabling HDD write cache neccessary (hdparm -W0)?

Michal Soltys <soltys@xxxxxxxx> · Tue, 29 Jan 2019 17:42:45 +0100

On 1/29/19 12:14 PM, Werner Fischer wrote:
Hello all,

I'd like to ask whether it is necessary to switch the write cache of 
HDDs and
SSDs (without power-loss-protection) to off when they are used for mdraid.

As discussed by Nik.Brt. and Song Liu last week, many storage devices
(HDDs/SSDs) "lie" when they indicate that the have written data. The 
data is
only in the drive's cache, but not on magnetic disc or flash. "The disk's
embedded microcontroller may signal the main computer that a disk write is
complete immediately after receiving the write data, before the data is 
actually
written to the platter." [1]

When used as a single disc, this can be handled with modern file 
systems, as
they use write barriers. [2][3]

But what I'm not sure is, how this is handled by mdraid in case of a sudden
power loss. In the past I've recommended to disable the drive's write 
cache by
using "hdparm -W0". This is also the default behavior of hardware raid
controllers. They switch off the drive cache of HDDs as they use their 
internal
(battery-backed) cache.

So my questions is:
Is it save to keep the cache of HDDs and SSDs (without 
power-loss-protection)
to on when used with mdraid?

My 2 cents:

I did some quick tests with a tiny bit better but still consumer grade 
stuff (4x old WD red/purple drives and 4x new WD ssds). On sas 
controller (non-raid) and with hdds further behind expander. 0
issues with that perl script (did 3 tests with each array, simultaneously).

The blog entry is very old and also explicity mentions that in ancient 
times fsync() didn't request flush. Today it's of course not the case.

With supposedly so many problematic disks - wouldn't filesystem 
journaling completely fall apart if flushes were not working correctly 
(regardless whether it's flush or fua) ? Or a flush sent from within 
e.g. VM. Or anything relying on fsync().

Another thing to consider is how much of the supposed issues are because 
of the hw raid controllers and whatever they are trying to do/assume, 
their time and available power constraints (IDK really) - while shifting 
the blame to disks.

In theory battery backup (or equivalent functionality on "enterprise" 
ssds) should let you get away without flush/fua (e.g. turning off 
filesystem barriers). Whether you really can or should is another thing.

Test if in doubt (that diskchecker.pl is a nifty tool).