Re: [PATCHSET v3 0/5] Support for RWF_UNCACHED

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 11 Dec 2019 11:14:56 -0800

[ Adding Johannes Weiner to the cc, I think he's looked at the working
set and the inactive/active LRU lists the most ]

On Wed, Dec 11, 2019 at 9:56 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> > In fact, that you say that just a pure random read case causes lots of
> > kswapd activity makes me think that maybe we've screwed up page
> > activation in general, and never noticed (because if you have enough
> > memory, you don't really see it that often)? So this might not be an
> > io_ring issue, but an issue in general.
>
> This is very much not an io_uring issue, you can see exactly the same
> kind of behavior with normal buffered reads or mmap'ed IO. I do wonder
> if streamed reads are as bad in terms of making kswapd go crazy, I
> forget if I tested that explicitly as well.

We definitely used to have people test things like "read the same
much-bigger-than-memory file over and over", and it wasn't supposed to
be all _that_ painful, because the pages never activated, and they got
moved out of the cache quickly and didn't disturb other activities
(other than the constant IO, of course, which can be a big deal in
itself).

But maybe that was just the streaming case. With read-around and
random accesses, maybe we end up activating too much (and maybe we
always did).

But I wouldn't be surprised if we've lost that as people went from
having 16-32MB to having that many GB instead - simply because a lot
of loads are basically entirely cached, and the few things that are
not tend to be explicitly uncached (ie O_DIRECT etc).

I think the workingset changes actually were maybe kind of related to
this - the inactive list can become too small to ever give people time
to do a good job of picking the _right_ thing to activate.

So this might either be the reverse situation - maybe we let the
inactive list grow too large, and then even a big random load will
activate pages that really shouldn't be activated? Or it might be
related to the workingset issue in that we've activated pages too
eagerly and not ever moved things back to the inactive list (which
then in some situations causes the inactive list to be very small).

Who knows. But this is definitely an area that I suspect hasn't gotten
all that much attention simply because memory has become much more
plentiful, and a lot of regular loads basically have enough memory
that almost all IO is cached anyway, and the old "you needed to be
more clever about balancing swap/inactive/active even under normal
loads" thing may have gone away a bit.

These days, even if you do somewhat badly in that balancing act, a lot
of loads probably won't notice that much. Either there is still so
much memory for caching that the added IO isn't really ever dominant,
or you had such a big load to begin with that it was long since
rewritten to use O_DIRECT.

            Linus