On Mon, Mar 17, 2025 at 02:21:29PM -0400, Kent Overstreet wrote: > On Mon, Mar 17, 2025 at 01:57:53PM -0400, Martin K. Petersen wrote: > > I'm not saying that devices are perfect or that the standards make > > sense. I'm just saying that your desired behavior does not match the > > reality of how a large number of these devices are actually implemented. > > > > The specs are largely written by device vendors and therefore > > deliberately ambiguous. Many of the explicit cache management bits and > > bobs have been removed from SCSI or are defined as hints because device > > vendors don't want the OS to interfere with how they manage resources, > > including caching. I get what your objective is. I just don't think FUA > > offers sufficient guarantees in that department. > > If you're saying this is going to be a work in progress to get the > behaviour we need in this scenario - yes, absolutely. > > Beyond making sure that retries go to the physical media, there's "retry > level" in the NVME spec which needs to be plumbed, and that one will be > particularly useful in multi device scenarios. (Crank retry level up > or down based on whether we can retry from different devices). I saw you mention the RRL mechanism in another patch, and it really piqued my interest. How are you intending to use this? In NVMe, this is controlled via an admin "Set Feature" command, which is absolutley not available to a block device, much less a file system. That command queue is only accesible to the driver and to user space admin, and is definitely not a per-io feature. > But we've got to start somewhere, and given that the spec says "bypass > the cache" - that looks like the place to start. This is a bit dangerous to assume. I don't find anywhere in any nvme specifications (also checked T10 SBC) with text saying anything similiar to "bypass" in relation to the cache for FUA reads. I am reasonably confident some vendors, especially ones developing active-active controllers, will fight you to the their win on the spec committee for this if you want to take it up in those forums. > If devices don't support the behaviour we want today, then nudging the > drive manufacturers to support it is infinitely saner than getting a > whole nother bit plumbed through the NVME standard, especially given > that the letter of the spec does describe exactly what we want. I my experience, the storage standards committees are more aligned to accomodate appliance vendors than anything Linux specific. Your desired read behavior would almost certainly be a new TPAR in NVMe to get spec defined behavior. It's not impossible, but I'll just say it is an uphill battle and the end result may or may not look like what you have in mind. In summary, what we have by the specs from READ FUA: Flush and Read What (I think) you want: Invalidate and Read It sounds like you are trying to say that your scenario doesn't care about the "Flush" so you get to use the existing semantics as the "Invalidate" case, and I really don't think you get that guarantee from any spec.