Re: [PATCHSET v8 0/12] Uncached buffered IO

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Tue, 7 Jan 2025 19:35:32 -0800

On Fri, 20 Dec 2024 08:47:38 -0700 Jens Axboe <axboe@xxxxxxxxx> wrote:

> So here's a new approach to the same concent, but using the page cache
> as synchronization. Due to excessive bike shedding on the naming, this
> is now named RWF_DONTCACHE, and is less special in that it's just page
> cache IO, except it prunes the ranges once IO is completed.
> 
> Why do this, you may ask? The tldr is that device speeds are only
> getting faster, while reclaim is not. Doing normal buffered IO can be
> very unpredictable, and suck up a lot of resources on the reclaim side.
> This leads people to use O_DIRECT as a work-around, which has its own
> set of restrictions in terms of size, offset, and length of IO. It's
> also inherently synchronous, and now you need async IO as well. While
> the latter isn't necessarily a big problem as we have good options
> available there, it also should not be a requirement when all you want
> to do is read or write some data without caching.

Of course, we're doing something here which userspace could itself do:
drop the pagecache after reading it (with appropriate chunk sizing) and
for writes, sync the written area then invalidate it.  Possible
added benefits from using separate threads for this.

I suggest that diligence requires that we at least justify an in-kernel
approach at this time, please.

And there's a possible middle-ground implementation where the kernel
itself kicks off threads to do the drop-behind just before the read or
write syscall returns, which will probably be simpler.  Can we please
describe why this also isn't acceptable?

Also, it seems wrong for a read(RWF_DONTCACHE) to drop cache if it was
already present.  Because it was presumably present for a reason.  Does
this implementation already take care of this?  To make an application
which does read(/etc/passwd, RWF_DONTCACHE) less annoying?

Also, consuming a new page flag isn't a minor thing.  It would be nice
to see some justification around this, and some decription of how many
we have left.