On 1/13/25 5:46 PM, Andrew Morton wrote: > On Mon, 13 Jan 2025 08:34:18 -0700 Jens Axboe <axboe@xxxxxxxxx> wrote: > >>> >> >> ... >> >>> Of course, we're doing something here which userspace could itself do: >>> drop the pagecache after reading it (with appropriate chunk sizing) and >>> for writes, sync the written area then invalidate it. Possible >>> added benefits from using separate threads for this. >>> >>> I suggest that diligence requires that we at least justify an in-kernel >>> approach at this time, please. >> >> Conceptually yes. But you'd end up doing extra work to do it. Some of >> that not so expensive, like system calls, and others more so, like LRU >> manipulation. Outside of that, I do think it makes sense to expose as a >> generic thing, rather than require applications needing to kick >> writeback manually, reclaim manually, etc. >> >>> And there's a possible middle-ground implementation where the kernel >>> itself kicks off threads to do the drop-behind just before the read or >>> write syscall returns, which will probably be simpler. Can we please >>> describe why this also isn't acceptable? >> >> That's more of an implementation detail. I didn't test anything like >> that, though we surely could. If it's better, there's no reason why it >> can't just be changed to do that. My gut tells me you want the task/CPU >> that just did the page cache additions to do the pruning to, that should >> be more efficient than having a kworker or similar do it. > > Well, gut might be wrong ;) A gut this big is rarely wrong ;-) > There may be benefit in using different CPUs to perform the dropbehind, > rather than making the read() caller do this synchronously. > > If I understand correctly, the write() dropbehind is performed at > interrupt (write completion) time so that's already async. It does, but we could actually get rid of that, at least when called via io_uring. From the testing I've done, doing it inline it definitely superior. Though it will depend on if you care about overall efficiency or just sheer speed/overhead of the read/write itself. -- Jens Axboe