On 12/11/19 10:37 AM, Linus Torvalds wrote: > On Wed, Dec 11, 2019 at 7:29 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> Comments appreciated! This should work on any standard file system, >> using either the generic helpers or iomap. I have tested ext4 and xfs >> for the right read/write behavior, but no further validation has been >> done yet. Patches are against current git, and can also be found here: > > I don't think this is conceptually wrong, but the implementation > smells a bit. > > I commented on the trivial part (the horrendous argument list to > iomap_actor), but I wonder how much of the explicit invalidation is > actually needed? Agree on the other email on that part, if we continue on this path, then I'll clean that up and shove the arguments in an actor struct. > Because active invalidation really is a horrible horrible thing to do. > It immediately means that you can't use this interface for normal > everyday things that may actually cache perfectly fine. > > What happens if you simply never _activate_ the page? Then they should > get invalidated on their own, without impacting any other load - but > only when there is _some_ memory pressure. They'll just stay on the > inactive lru list, and get re-used quickly. > > Note that there are two ways to activate a page: the "accessed while > on the inactive list" will activate it, but these days we also have a > "pre-activate" path in the workingset code (see workingset_refault()). > > Even if you might also want an explicit invalidate path, I would like > to hear what it looks like if you instead of - or in addition to - > invalidating, have a "don't activate" flag. > > We don't have all _that_ many places where we activate pages, and they > should be easy to find (just grep for "SetPageActive()"), although the > call chain may make it a bit painful to add a "don't do it for this > access" kind of things. > > But I think most of the regular IO call chains come through > "mark_page_accessed()". So _that_ is the part you want to avoid (and > maybe the workingset code). And that should be fairly straightforward, > I think. Sure, I can give that a go and see how that behaves. > In fact, that you say that just a pure random read case causes lots of > kswapd activity makes me think that maybe we've screwed up page > activation in general, and never noticed (because if you have enough > memory, you don't really see it that often)? So this might not be an > io_ring issue, but an issue in general. This is very much not an io_uring issue, you can see exactly the same kind of behavior with normal buffered reads or mmap'ed IO. I do wonder if streamed reads are as bad in terms of making kswapd go crazy, I forget if I tested that explicitly as well. I'll run some streamed and random read testing on both and see how they behave, then report back. -- Jens Axboe