Re: [PATCH 13/14] block: Allow REQ_FUA|REQ_READ

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Mon, 17 Mar 2025 23:11:10 -0700

On Mon, Mar 17, 2025 at 02:21:29PM -0400, Kent Overstreet wrote:
> Beyond making sure that retries go to the physical media, there's "retry
> level" in the NVME spec which needs to be plumbed, and that one will be
> particularly useful in multi device scenarios. (Crank retry level up
> or down based on whether we can retry from different devices).

The read recovery level is to reduce the amount or intensity of read
retries, not to increase it so that workloads that have multiple sources
for data aren't stalled by the sometimes extremely long read recovery.
You won't really find much hardware that actually implements it.

As a little background: The read recovery level was added as part of the
predictive latency mode and originally tied to the (now heavily deprecated
and never implemented in scale) NVM sets.  Yours truly successfully
argued that they should not be tied to NVM sets and helped to make them
more generic, but the there was basically no uptake of the read recovery
level, with or without NVM sets.

> But we've got to start somewhere, and given that the spec says "bypass
> the cache"

But as people have been telling you multiple times that is not what any
of the specs says.

> - that looks like the place to start. If devices don't
> support the behaviour we want today, then nudging the drive
> manufacturers to support it is infinitely saner than getting a whole
> nother bit plumbed through the NVME standard, especially given that the
> letter of the spec does describe exactly what we want.

That assumes that anyone but you actually agrees with your definition
of sane behavior.  I don't think you've been overly successful on
selling anyone on that idea so far.  Selling the hardware people on it
will be even harder given that the "cache" really usually is not a
cache but an integral part of the data path.