[ Adding Johannes Weiner to the cc, I think he's looked at the working set and the inactive/active LRU lists the most ] On Wed, Dec 11, 2019 at 9:56 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > > > In fact, that you say that just a pure random read case causes lots of > > kswapd activity makes me think that maybe we've screwed up page > > activation in general, and never noticed (because if you have enough > > memory, you don't really see it that often)? So this might not be an > > io_ring issue, but an issue in general. > > This is very much not an io_uring issue, you can see exactly the same > kind of behavior with normal buffered reads or mmap'ed IO. I do wonder > if streamed reads are as bad in terms of making kswapd go crazy, I > forget if I tested that explicitly as well. We definitely used to have people test things like "read the same much-bigger-than-memory file over and over", and it wasn't supposed to be all _that_ painful, because the pages never activated, and they got moved out of the cache quickly and didn't disturb other activities (other than the constant IO, of course, which can be a big deal in itself). But maybe that was just the streaming case. With read-around and random accesses, maybe we end up activating too much (and maybe we always did). But I wouldn't be surprised if we've lost that as people went from having 16-32MB to having that many GB instead - simply because a lot of loads are basically entirely cached, and the few things that are not tend to be explicitly uncached (ie O_DIRECT etc). I think the workingset changes actually were maybe kind of related to this - the inactive list can become too small to ever give people time to do a good job of picking the _right_ thing to activate. So this might either be the reverse situation - maybe we let the inactive list grow too large, and then even a big random load will activate pages that really shouldn't be activated? Or it might be related to the workingset issue in that we've activated pages too eagerly and not ever moved things back to the inactive list (which then in some situations causes the inactive list to be very small). Who knows. But this is definitely an area that I suspect hasn't gotten all that much attention simply because memory has become much more plentiful, and a lot of regular loads basically have enough memory that almost all IO is cached anyway, and the old "you needed to be more clever about balancing swap/inactive/active even under normal loads" thing may have gone away a bit. These days, even if you do somewhat badly in that balancing act, a lot of loads probably won't notice that much. Either there is still so much memory for caching that the added IO isn't really ever dominant, or you had such a big load to begin with that it was long since rewritten to use O_DIRECT. Linus