On Thu, Jun 20, 2024 at 03:49:54PM +0100, Matthew Wilcox wrote: > On Thu, Jun 20, 2024 at 10:16:02AM -0400, Kent Overstreet wrote: > > That's really just descriptive, not prescriptive. > > > > The intent of O_DIRECT is "bypass the page cache", the alignment > > restrictions are just a side effect of that. Applications just care > > about is having predictable performance characteristics. > > But any application that has been written to use O_DIRECT already has the > alignment & size guarantees in place. What this patch is attempting to do > is make it "more friendly" to use, and I'm not sure that's a great idea. > Not without buy-in from a large cross-section of filesystem people. The thing is, we're currently conflating "don't use the page cache" and "don't bounce", and we really shouldn't be. Sometimes performance is your main criteria and you do want the error if you misaligned the IO, but not always. The the main reason being that mixing buffered and O_DIRECT is _really_ expensive, and you will quite often have codepaths that don't care about performance, but do need to use O_DIRECT so they don't interfere with your paths that do care about performance. So they're forced to deal with bouncing and alignment restrictions at the applicatoin level, even though they really don't want to be, and we (at least bcachefs) already have the code to do that bouncing when required - there's really no reason for them to be hand rolling that, and if writes are involved then it gets really nontrivial. This is one of the reasons I'll likely be implementing sub-blocksize writes through the low level write path in bcachefs. O_DIRECT gives you things that buffered IO doesn't - it's much cheaper, and simpler to guarantee write ordering of O_DIRECT writes (c.f. untorn writes, O_DIRECT is much easier than buffered there as well, and it's even easier in a COW filesystem) - and it's going to be worthwhile to be able to provide that behaviour without the alignment restrictions that currently come with O_DIRECT. To sum all that up: "don't bounce" should really another rwf flag we can pass to preadv2/pwritev2 - RWF_NOBOUNCE. And for bcachefs I would implement bouncing-when-required even without that flag being available, simply because that's current behaviour (due to checksums being extent granularity). That flag really only makes sense for bcachefs once we get opt-in block granular checksums (which is coming, as well).