Re: [RFC] Mechanism to induce memory reclaim

Johannes Weiner <hannes@xxxxxxxxxxx> · Tue, 8 Mar 2022 14:27:49 -0500

On Tue, Mar 08, 2022 at 09:49:20AM -0500, Dan Schatzberg wrote:
> On Mon, Mar 07, 2022 at 03:50:36PM -0500, Johannes Weiner wrote:
> > On Sun, Mar 06, 2022 at 03:11:23PM -0800, David Rientjes wrote:
> > >  - swappiness factor
> > 
> > This I'm not sure about.
> > 
> > Mostly because I'm not sure about swappiness in general. It balances
> > between anon and file, but both of them are aged according to the same
> > LRU rules. The only reason to prefer one over the other seems to be
> > when the cost of reloading one (refault vs swapin) isn't the same as
> > the other. That's usually a hardware property, which in a perfect
> > world we'd auto-tune inside the kernel based on observed IO
> > performance. Not sure why you'd want this per reclaim request.
> 
> I think this could be useful for budgeting write-endurance. You may
> want to tune down a workload's swappiness on a per-reclaim basis in
> order to control how much swap-out (and therefore disk writes) its
> doing. Right now the only way to control this is by writing to
> vm.swappiness before doing the explicit reclaim which can momentarily
> effect other reclaim behavior on the machine.

Yeah the global swappiness setting is not ideal for tuning behavior of
individual workloads. On the other hand, flash life and write budget
are global resources shared by all workloads on the system. Does it
make sense longer term to take a workload-centric approach to that?

There are also filesystem writes to think about. If the swappable set
has already been swapped and cached, reclaiming it again doesn't
require IO. Reclaiming dirty cache OTOH requires IO, and upping
reclaim pressure on files will increase the writeback flush rates
(which reduces cache effectiveness and increases aggregate writes).

I wonder if it would make more sense to recognize the concept of write
endurance more broadly in MM code than just swap. Where you specify a
rate limit (globally? with per-cgroup shares?), and then, yes, the VM
will back away from swap iff it writes too much. But also throttle
writeback and push back on file reclaim and dirtying processes in
accordance with that policy.