Re: [PATCH 13/14] block: Allow REQ_FUA|REQ_READ

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 17, 2025 at 02:21:29PM -0400, Kent Overstreet wrote:
> On Mon, Mar 17, 2025 at 01:57:53PM -0400, Martin K. Petersen wrote:
> > I'm not saying that devices are perfect or that the standards make
> > sense. I'm just saying that your desired behavior does not match the
> > reality of how a large number of these devices are actually implemented.
> > 
> > The specs are largely written by device vendors and therefore
> > deliberately ambiguous. Many of the explicit cache management bits and
> > bobs have been removed from SCSI or are defined as hints because device
> > vendors don't want the OS to interfere with how they manage resources,
> > including caching. I get what your objective is. I just don't think FUA
> > offers sufficient guarantees in that department.
> 
> If you're saying this is going to be a work in progress to get the
> behaviour we need in this scenario - yes, absolutely.
> 
> Beyond making sure that retries go to the physical media, there's "retry
> level" in the NVME spec which needs to be plumbed, and that one will be
> particularly useful in multi device scenarios. (Crank retry level up
> or down based on whether we can retry from different devices).

I saw you mention the RRL mechanism in another patch, and it really
piqued my interest. How are you intending to use this? In NVMe, this is
controlled via an admin "Set Feature" command, which is absolutley not
available to a block device, much less a file system. That command queue
is only accesible to the driver and to user space admin, and is
definitely not a per-io feature.
 
> But we've got to start somewhere, and given that the spec says "bypass
> the cache" - that looks like the place to start. 

This is a bit dangerous to assume. I don't find anywhere in any nvme
specifications (also checked T10 SBC) with text saying anything similiar
to "bypass" in relation to the cache for FUA reads. I am reasonably
confident some vendors, especially ones developing active-active
controllers, will fight you to the their win on the spec committee for
this if you want to take it up in those forums.

> If devices don't support the behaviour we want today, then nudging the
> drive manufacturers to support it is infinitely saner than getting a
> whole nother bit plumbed through the NVME standard, especially given
> that the letter of the spec does describe exactly what we want.

I my experience, the storage standards committees are more aligned to
accomodate appliance vendors than anything Linux specific. Your desired
read behavior would almost certainly be a new TPAR in NVMe to get spec
defined behavior. It's not impossible, but I'll just say it is an uphill
battle and the end result may or may not look like what you have in
mind.

In summary, what we have by the specs from READ FUA:

 Flush and Read

What (I think) you want:

 Invalidate and Read

It sounds like you are trying to say that your scenario doesn't care
about the "Flush" so you get to use the existing semantics as the
"Invalidate" case, and I really don't think you get that guarantee from
any spec.




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux