Re: Is it possible that certain physical disk doesn't implement flush correctly?

"Austin S. Hemmelgarn" <ahferroin7@xxxxxxxxx> · Mon, 1 Apr 2019 08:04:51 -0400

On 2019-03-30 08:31, Qu Wenruo wrote:
Hi,

I'm wondering if it's possible that certain physical device doesn't
handle flush correctly.

E.g. some vendor does some complex logical in their hdd controller to
skip certain flush request (but not all, obviously) to improve performance?

Do anyone see such reports?
Some OCZ SSD's had issues that could be explained by this type of 
behavior (and the associated data-loss problems are part of why they 
don't make SSD's any more).

Other than that, I know of no modern _physical_ hardware that does this 
(I've got  5.25 inch full-height SCSI-2 disks that have this issue at 
work, and am really glad we have no systems that use them anymore).  It 
is, however, pretty easy to configure _virtual_ disk drives to behave 
like this.

And if proves to happened before, how do we users detect such problem?
There's unfortunately no good way to do so unless you can get the disk 
to drop it's write cache without writing out it's contents.  Assuming 
you can do that, the trivial test is to write a block, issue a FLUSH, 
force drop the cache, and then read-back the block that was written. 
There were some old SCSI disks that actually let you do this by issuing 
some extended SCSI commands, but I don't know of any ATA disks where 
this was ever possible, and most modern SCSI disks won't let you do it 
unless you flash custom firmware to allow for it.

Of course, you can always test with throw-away data by manually inducing 
power failures, but that's tedious and hard on the hardware.

Can we just check the flush time against the write before flush call?
E.g. write X random blocks into that device, call fsync() on it, check
the execution time. Repeat Y times, and compare the avg/std.
And change X to 2X/4X/..., repeat above check.

Thanks,
Qu