Re: [RFC] Mechanism to induce memory reclaim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 08, 2022 at 05:05:11PM +0100, Michal Hocko wrote:
> On Tue 08-03-22 09:44:35, Dan Schatzberg wrote:
> > On Tue, Mar 08, 2022 at 01:53:19PM +0100, Michal Hocko wrote:
> > > On Mon 07-03-22 15:26:18, Johannes Weiner wrote:
> [...]
> > > > A mechanism to request a fixed number of pages to reclaim turned out
> > > > to work much, much better in practice. We've been using a simple
> > > > per-cgroup knob (like here: https://lkml.org/lkml/2020/9/9/1094).
> > > 
> > > Could you share more details here please? How have you managed to find
> > > the reclaim target and how have you overcome challenges to react in time
> > > to have some head room for the actual reclaim?
> > 
> > We have a userspace agent that just repeatedly triggers proactive
> > reclaim and monitors PSI metrics to maintain some constant but low
> > pressure. In the complete absense of pressure we will reclaim some
> > configurable percentage of the workload's memory. This reclaim amount
> > tapers down to zero as PSI approaches the target threshold.
> > 
> > I don't follow your question regarding head-room. Could you elaborate?
> 
> One of the concern that was expressed in the past is how effectively
> can pro-active userspace reclaimer act on memory demand transitions. It
> takes some time to get refaults/PSI changes and then you should
> be acting rather swiftly.

This was a concern with the fixed limit, but not so much with the
one-off requests for reclaim. There is nothing in the way that would
prevent the workload from quickly allocating all the memory it
needs. The goal of proactive reclaim isn't to punish or restrict the
workload, but rather to continuously probe it for cold pages, to
measure the minimum amount of memory it requires to run healthily.

> At least if you aim at somehow smooth transition. Tuning this up to
> work reliably seems to be far from trivial. Not to mention that
> changes in the memory reclaim implementation could make the whole
> tuning rather fragile.

When reclaim becomes worse at finding the coldest memory, pressure
rises with fewer pages evicted and we back off earlier. So a reclaim
regression doesn't necessarily translate to less smooth operations or
increased workload impact, but rather to an increased memory
footprint. This may be measurable, but isn't really an operational
emergency - unless reclaim gets 50% worse, which isn't very likely, and
in which case we'd stop the kernel upgrade until the bug is fixed ;)

It's pretty robust. The tuning was done empirically, but now the same
configuration has held up to many different services; some with swap,
some with zswap, some with just cache, different types of SSDs,
different kernel versions, even drastic reclaim changes such as
Joonsoo's workingset for anon pages change.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux