Re: [RFC 0/4] Introduce unbalance proactive reclaim

Michal Hocko <mhocko@xxxxxxxx> · Fri, 10 Nov 2023 13:24:21 +0100

On Fri 10-11-23 11:48:49, Huan Yang wrote:
[...]
> Also, When the application enters the foreground, the startup speed
> may be slower. Also trace show that here are a lot of block I/O.
> (usually 1000+ IO count and 200+ms IO Time) We usually observe very
> little block I/O caused by zram refault.(read: 1698.39MB/s, write:
> 995.109MB/s), usually, it is faster than random disk reads.(read:
> 48.1907MB/s write: 49.1654MB/s). This test by zram-perf and I change a
> little to test UFS.
> 
> Therefore, if the proactive reclamation encounters many file pages,
> the application may become slow when it is opened.

OK, this is an interesting information. From the above it seems that
storage based IO refaults are order of magnitude more expensive than
swap (zram in this case). That means that the memory reclaim should 
_in general_ prefer anonymous memory reclaim over refaulted page cache,
right? Or is there any reason why "frozen" applications are any
different in this case?

Our traditional interface to control the anon vs. file balance has been
swappiness. It is not the best interface and it has its flaws but
have you experimented with the global swappiness to express that
preference? What were your observations? Please note that the behavior
might be really different with different kernel versions so I would
really stress out that testing with the current Linus (or akpm) tree is
necessary.

Anyway, the more I think about that the more I am convinced that
explicit anon/file extension for the memory.reclaim interface is just a
wrong way to address a more fundamental underlying problem. That is, the
default reclaim choice over anon vs file preference should consider the
cost of the refaulting IO. This is more a property of the underlying
storage than a global characteristic. In other words, say you have
mutlitple storages, one that is a network based with a high latency and
other that is a local fast SSD. Reclaiming a page backed by the slower
storage is going to be more expensive to refault than the one backed by
the fast storage.  So even page cache pages are not really all the same. 

It is quite likely that a IO cost aspect is not really easy to integrate
into the memory reclaim but it seems to me this is a better way to focus
on for a better long term solution. Our existing refaulting
infrastructure should help in that respect. Also MGLRU could fit for
that purpose better than the traditional LRU based reclaim as the higher
generations could be used for more more expensive pages.
-- 
Michal Hocko
SUSE Labs