Re: [PATCH 13/14] block: Allow REQ_FUA|REQ_READ

"Martin K. Petersen" <martin.petersen@xxxxxxxxxx> · Mon, 17 Mar 2025 13:57:53 -0400

Kent,

>> At least for SCSI, given how FUA is usually implemented, I consider
>> it quite unlikely that two read operations back to back would somehow
>> cause different data to be transferred. Regardless of which flags you
>> use.
>
> Based on what, exactly?

Based on the fact that many devices will either blindly flush on FUA or
they'll do the equivalent of a media verify operation. In neither case
will you get different data returned. The emphasis for FUA is on media
durability, not caching.

In most implementations the cache isn't an optional memory buffer thingy
that can be sidestepped. It is the only access mechanism that exists
between the media and the host interface. Working memory if you will. So
bypassing the device cache is not really a good way to think about it.

The purpose of FUA is to ensure durability for future reads, it is a
media management flag. As such, any effect FUA may have on the device
cache is incidental.

For SCSI there is a different flag to specify caching behavior. That
flag is orthogonal to FUA and did not get carried over to NVMe.

> We _know_ devices are not perfect, and your claim that "it's quite
> unlikely that two reads back to back would return different data"
> amounts to claiming that there are no bugs in a good chunk of the IO
> path and all that is implemented perfectly.

I'm not saying that devices are perfect or that the standards make
sense. I'm just saying that your desired behavior does not match the
reality of how a large number of these devices are actually implemented.

The specs are largely written by device vendors and therefore
deliberately ambiguous. Many of the explicit cache management bits and
bobs have been removed from SCSI or are defined as hints because device
vendors don't want the OS to interfere with how they manage resources,
including caching. I get what your objective is. I just don't think FUA
offers sufficient guarantees in that department.

Also, given the amount of hardware checking done at the device level, my
experience tells me that you are way more likely to have undetected
corruption problems on the host side than inside the storage device. In
general storage devices implement very extensive checking on both
control and data paths. And they will return an error if there is a
mismatch (as opposed to returning random data).

-- 
Martin K. Petersen	Oracle Linux Engineering