Re: [PATCH 13/14] block: Allow REQ_FUA|REQ_READ

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Mon, 17 Mar 2025 11:43:46 -0400

On Mon, Mar 17, 2025 at 11:30:13AM -0400, Martin K. Petersen wrote:
> 
> Keith,
> 
> > In my experience, a Read FUA is used to ensure what you're reading has
> > been persistently committed.
> 
> Exactly, that is the intended semantic.
> 
> Kent is trying to address a scenario in which data in the cache is
> imperfect and the data on media is reliable. SCSI did not consider such
> a scenario because the entire access model would be fundamentally broken
> if cache contents couldn't be trusted.
> 
> FUA was defined to handle the exact opposite scenario: Data in cache is
> reliable by definition and the media inherently unreliable.
> Consequently, what all the standards describe is a flush-then-read
> semantic, not an invalidate-cache-then-read ditto. The purpose of FUA is
> to validate endurance of future reads.
> 
> SCSI has a different per-command flag for cache management. However, it
> is only a hint and therefore does not offer the guarantee Kent is
> looking for.
> 
> At least for SCSI, given how FUA is usually implemented, I consider it
> quite unlikely that two read operations back to back would somehow cause
> different data to be transferred. Regardless of which flags you use.

Based on what, exactly?

We're already seeing bit corruption caught - and corrupted with scrub,
when users aren't running with replicas=1 (which is fine for data that's
backed up elsewhere, of course).

Then it's just a question of where in the lower layers it comes from.

We _know_ devices are not perfect, and your claim that "it's quite
unlikely that two reads back to back would return different data"
amounts to claiming that there are no bugs in a good chunk of the IO
path and all that is implemented perfectly.

But that's just bullshit.

When we do a retry after observing bit corruption, we need to retry the
full read, not just from device cache, and the spec is quite specific on
that point.

Now, if you're claiming that a FUA read will behave differently, please
do state clearly what you think will happen with clean cached data in
the cache. But that'll amount to a clear spec violation...

> Also, and obviously this is also highly implementation-dependent, but I
> don't think one should underestimate the amount of checking done along
> the entire data path inside the device, including DRAM to the transport
> interface ASIC. Data is usually validated by the ASIC on the way out and
> from there on all modern transports have some form of checking. Having
> corrupted data placed in host memory without an associated error
> condition is not a common scenario. Bit flips in host memory are much
> more common.

Yeah, I'm not in the business of trusting the lower layers like this.
Trust, but verify :)