On 2019-03-30 08:31, Qu Wenruo wrote:
Hi,
I'm wondering if it's possible that certain physical device doesn't
handle flush correctly.
E.g. some vendor does some complex logical in their hdd controller to
skip certain flush request (but not all, obviously) to improve performance?
Do anyone see such reports?
Some OCZ SSD's had issues that could be explained by this type of
behavior (and the associated data-loss problems are part of why they
don't make SSD's any more).
Other than that, I know of no modern _physical_ hardware that does this
(I've got 5.25 inch full-height SCSI-2 disks that have this issue at
work, and am really glad we have no systems that use them anymore). It
is, however, pretty easy to configure _virtual_ disk drives to behave
like this.
And if proves to happened before, how do we users detect such problem?
There's unfortunately no good way to do so unless you can get the disk
to drop it's write cache without writing out it's contents. Assuming
you can do that, the trivial test is to write a block, issue a FLUSH,
force drop the cache, and then read-back the block that was written.
There were some old SCSI disks that actually let you do this by issuing
some extended SCSI commands, but I don't know of any ATA disks where
this was ever possible, and most modern SCSI disks won't let you do it
unless you flash custom firmware to allow for it.
Of course, you can always test with throw-away data by manually inducing
power failures, but that's tedious and hard on the hardware.
Can we just check the flush time against the write before flush call?
E.g. write X random blocks into that device, call fsync() on it, check
the execution time. Repeat Y times, and compare the avg/std.
And change X to 2X/4X/..., repeat above check.
Thanks,
Qu