Re: [PATCH 13/14] block: Allow REQ_FUA|REQ_READ

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Tue, 18 Mar 2025 17:33:37 -0400

On Mon, Mar 17, 2025 at 11:11:10PM -0700, Christoph Hellwig wrote:
> On Mon, Mar 17, 2025 at 02:21:29PM -0400, Kent Overstreet wrote:
> > Beyond making sure that retries go to the physical media, there's "retry
> > level" in the NVME spec which needs to be plumbed, and that one will be
> > particularly useful in multi device scenarios. (Crank retry level up
> > or down based on whether we can retry from different devices).
> 
> The read recovery level is to reduce the amount or intensity of read
> retries, not to increase it so that workloads that have multiple sources
> for data aren't stalled by the sometimes extremely long read recovery.
> You won't really find much hardware that actually implements it.
> 
> As a little background: The read recovery level was added as part of the
> predictive latency mode and originally tied to the (now heavily deprecated
> and never implemented in scale) NVM sets.  Yours truly successfully
> argued that they should not be tied to NVM sets and helped to make them
> more generic, but the there was basically no uptake of the read recovery
> level, with or without NVM sets.

Well, if it can't be set per IO, that makes it fairly useless.

If it _was_ per IO it'd be dead easy to slot into bcachefs, the tracking
of IO/checksum errors in a replicated/erasure coded extent is
sophisticated enough to easily accommodate things like this (mainly you
need to know when submitting - do we have additional retries? and then
when you get an error, you don't want count it if it was "fast fail").