Re: [RFC] Mechanism to induce memory reclaim

Wei Xu <weixugc@xxxxxxxxxx> · Tue, 8 Mar 2022 09:21:44 -0800

On Tue, Mar 8, 2022 at 8:05 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Tue 08-03-22 09:44:35, Dan Schatzberg wrote:
> > On Tue, Mar 08, 2022 at 01:53:19PM +0100, Michal Hocko wrote:
> > > On Mon 07-03-22 15:26:18, Johannes Weiner wrote:
> [...]
> > > > A mechanism to request a fixed number of pages to reclaim turned out
> > > > to work much, much better in practice. We've been using a simple
> > > > per-cgroup knob (like here: https://lkml.org/lkml/2020/9/9/1094).
> > >
> > > Could you share more details here please? How have you managed to find
> > > the reclaim target and how have you overcome challenges to react in time
> > > to have some head room for the actual reclaim?
> >
> > We have a userspace agent that just repeatedly triggers proactive
> > reclaim and monitors PSI metrics to maintain some constant but low
> > pressure. In the complete absense of pressure we will reclaim some
> > configurable percentage of the workload's memory. This reclaim amount
> > tapers down to zero as PSI approaches the target threshold.
> >
> > I don't follow your question regarding head-room. Could you elaborate?
>
> One of the concern that was expressed in the past is how effectively
> can pro-active userspace reclaimer act on memory demand transitions. It
> takes some time to get refaults/PSI changes and then you should
> be acting rather swiftly. At least if you aim at somehow smooth
> transition. Tuning this up to work reliably seems to be far
> from trivial. Not to mention that changes in the memory reclaim
> implementation could make the whole tuning rather fragile.

The userspace reclaimer is not a complete replacement of the kernel
memory reclaim (kswapd or direct reclaim). At least in Google's user
cases, it is to proactively identify memory savings opportunities and
reclaim some amount of cold pages set by the policy to free up the
memory for more demanding jobs or scheduling new jobs.  If a job
(container) has a rapid memory demand increase, it would just mean
less proactive savings from this job.  The userspace reclaimer doesn't
have to act much more swiftly for such jobs with the proposed
nr_bytes_to_reclaim interface.  If the userspace reclaim interface was
memory.high-based, then such jobs would indeed be a serious problem.