On Tue, 31 May 2022, Keith Busch wrote: > On Tue, May 31, 2022 at 04:04:12PM -0700, Eric Wheeler wrote: > > > > * Write-through: write is done synchronously both to the cache and to > > the backing store. > > > > * Write-back (also called write-behind): initially, writing is done only > > to the cache. The write to the backing store is postponed until the > > modified content is about to be replaced by another cache block. > > [ https://en.wikipedia.org/wiki/Cache_(computing)#Writing_policies ] > > > > > > So the kernel's notion of "write through" meaning "Drop FLUSH/FUA" sounds > > like the industry meaning of "write-back" as defined above; conversely, > > the kernel's notion of "write back" sounds like the industry definition of > > "write-through" > > > > Is there a well-meaning rationale for the kernel's concept of "write > > through" to be different than what end users have been conditioned to > > understand? > > I think we all agree what "write through" vs "write back" mean. I'm just not > sure what's the source of the disconnect with the kernel's behavior. > > A "write through" device persists data before completing a write operation. > > Flush/FUA says to write data to persistence before completing the operation. > > You don't need both. Flush/FUA should be a no-op to a "write through" device > because the data is synchronously committed to the backing store automatically. Ok, I think I'm starting to understand the rationale, thank you for your patience while I've come to wrap my head around it. So, using a RAID controller cache as an example: 1. A RAID controller with a _non-volatile_ "writeback" cache (from the controller's perspective, ie, _with_ battery) is a "write through" device as far as the kernel is concerned because the controller will return the write as complete as soon as it is in the persistent cache. 2. A RAID controller with a _volatile_ "writeback" cache (from the controller's perspective, ie _without_ battery) is a "write back" device as far as the kernel is concerned because the controller will return the write as complete as soon as it is in the cache, but the cache is not persistent! So in that case flush/FUA is necessary. I think it is rare someone would configure a RAID controller is as writeback (in the controller) when the cache is volatile (ie, without battery), but it is an interesting way to disect this to understand the rationale around value choices for the `queue/write_cache` flag in sysfs. So please correct me here if I'm wrong: theoretically, a RAID controller with a volatile writeback cache is "safe" in terms of any flush/FIO behavior, assuming the controller respects those ops in writeback mode. For example, ext4's journal is probably consistent after a crash, even if 2GB of cached data might be lost (assuming FUA and not FLUSH is being used for meta, I don't actually know ext4's implementation there). I would guess that most end users are going to expect queue/write_cache to match their RAID controller's naming convention. If they see "write through" when they know their controller is in writeback w/battery then they might reasonably expect the flag to show "write back", too. If they then force it to "write back" then they loose the performance benefit. Given that, and considering end users that configure raid controllers do not commonly understand the flush/FUA intracies and what really constitutes "write back" vs "write through" from the kernel's perspective, then perhaps it would be a good idea to add more documentation around write_cache here: https://www.kernel.org/doc/Documentation/block/queue-sysfs.txt What do you think? -- Eric Wheeler