On Mon, Mar 17, 2025 at 11:11:10PM -0700, Christoph Hellwig wrote: > On Mon, Mar 17, 2025 at 02:21:29PM -0400, Kent Overstreet wrote: > > Beyond making sure that retries go to the physical media, there's "retry > > level" in the NVME spec which needs to be plumbed, and that one will be > > particularly useful in multi device scenarios. (Crank retry level up > > or down based on whether we can retry from different devices). > > The read recovery level is to reduce the amount or intensity of read > retries, not to increase it so that workloads that have multiple sources > for data aren't stalled by the sometimes extremely long read recovery. > You won't really find much hardware that actually implements it. > > As a little background: The read recovery level was added as part of the > predictive latency mode and originally tied to the (now heavily deprecated > and never implemented in scale) NVM sets. Yours truly successfully > argued that they should not be tied to NVM sets and helped to make them > more generic, but the there was basically no uptake of the read recovery > level, with or without NVM sets. Well, if it can't be set per IO, that makes it fairly useless. If it _was_ per IO it'd be dead easy to slot into bcachefs, the tracking of IO/checksum errors in a replicated/erasure coded extent is sophisticated enough to easily accommodate things like this (mainly you need to know when submitting - do we have additional retries? and then when you get an error, you don't want count it if it was "fast fail").