Re: [RFC] Add sysctl option to drop disk flushes in bcache? (was: Bcache in writes direct with fsync)

Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> · Tue, 31 May 2022 12:42:49 -0700 (PDT)

On Sat, 28 May 2022, Keith Busch wrote:
> On Sat, May 28, 2022 at 12:57:26PM +0000, Adriano Silva wrote:
> > Dear Christoph,
> > 
> > > Once you do that, the block layer ignores all flushes and FUA bits, so
> > > yes it is going to be a lot faster.  But also completely unsafe because
> > > it does not provide any data durability guarantees.
> > 
> > Sorry, but wouldn't it be the other way around? Or did I really not 
> > understand your answer?
> > 
> > Sorry, I don't know anything about kernel code, but wouldn't it be the 
> > other way around?
> > 
> > It's just that, I may not be understanding. And it's likely that I'm 
> > not, because you understand more about this, I'm new to this subject. 
> > I know very little about it, or almost nothing.
> > 
> > But it's just that I've read the opposite about it.
> > 
> >  Isn't "write through" to provide more secure writes?
> > 
> > I also see that "write back" would be meant to be faster. No?
> 
> The sysfs "write_cache" attribute just controls what the kernel does. It
> doesn't change any hardware settings.
> 
> In "write back" mode, a sync write will have FUA set, which will generally be
> slower than a write without FUA. In "write through" mode, the kernel doesn't
> set FUA so the data may not be durable after the completion if the controller
> is using a volatile write cache.

Something seems wrong here: Typically on a RAID controller LUN 
configuration "writeback" means that the non-volatile cache is active so 
"write back caching" is enabled.

According to https://www.kernel.org/doc/Documentation/block/queue-sysfs.txt:

	"When read, this file will display whether the device has write
	back caching enabled or not. It will return "write back" for the former
	case, and "write through" for the latter."

If my text mailer would underline then I would underline this from the 
documentation: "whether the device has write back caching enabled or not"

Is there a good explanation for why the kernel setting is exactly 
_opposite_ of the controller setting?

> > But I understand that when I do a write with direct ioping (-D) and 
> > with forced sync (-Y), then an enterprise NVME device with PLP (Power 
> > Loss Protection) like mine here should perform very well because in 
> > theory, the messages are sent to the hardware by the OS with an 
> > instruction for the Hardware to ignore the cache (correct?), but the 
> > NVME device will still put it in its local cache and give an immediate 
> > response to the OS saying that the data has been written, because he 
> > knows his local cache is a safe place for this (in theory).
> 
> If the device's power-loss protected memory is considered non-volatile, then it
> shouldn't be reporting a volatile write cache, and it may complete commands
> once the write data reaches its non-volatile cache. It can treat flush and FUA
> as no-ops.
>  
> > On the other hand, answering why writing is slow when "write back" is 
> > activated is intriguing. Could it be the software logic stack involved 
> > to do the Write Back? I don't know.
> 
> Yeah, the software stack will issue flushes and FUA in "write back" 
> mode.

If it this setting really is intended to be backwards from industry 
vernacular then perhaps it is a documentation bug...

-Eric