Re: [RFC] Mechanism to induce memory reclaim

Johannes Weiner <hannes@xxxxxxxxxxx> · Thu, 10 Mar 2022 11:58:09 -0500

On Wed, Mar 09, 2022 at 02:03:21PM -0800, David Rientjes wrote:
> On Tue, 8 Mar 2022, Michal Hocko wrote:
> 
> > > Let me take a stab at this. The specific reasons why high limit is not a
> > > good interface to implement proactive reclaim:
> > > 
> > > 1) It can cause allocations from the target application to get
> > > throttled.
> > > 
> > > 2) It leaves a state (high limit) in the kernel which needs to be reset
> > > by the userspace part of proactive reclaimer.
> > > 
> > > If I remember correctly, Facebook actually tried to use high limit to
> > > implement the proactive reclaim but due to exactly these limitations [1]
> > > they went the route [2] aligned with this proposal.
> > 
> > I do remember we have discussed this in the past. There were proposals
> > for an additional limit to trigger a background reclaim [3] or to add a
> > pressure based memcg knob [4]. For the nr_to_reclaim based interface
> > there were some challenges outlined in that email thread. I do
> > understand that practical experience could have confirmed or diminished
> > those concerns.
> > 
> > I am definitely happy to restart those discussion but it would be really
> > great to summarize existing options and why they do not work in
> > practice. It would be also great to mention why concerns about nr_to_reclaim
> > based interface expressed in the past are not standing out anymore wrt.
> > other proposals.
> > 
> 
> Johannes, since you had pointed out that the current approach used at Meta 
> and described in the TMO paper works well in practice and is based on 
> prior discussions of memory.reclaim[1], do you have any lingering concerns 
> from that 2020 thread?

I'd be okay with merging the interface proposed in that thread as-is.

> My first email in this thread proposes something that can still do memcg 
> based reclaim but is also possible even without CONFIG_MEMCG enabled.  
> That's particularly helpful for configs used by customers that don't use 
> memcg, namely Chrome OS.  I assume we're not losing any functionality that 
> your use case depends on if we are to introduce a per-node sysfs mechanism 
> for this as an alternative since you can still specify a memcg id?

We'd lose the delegation functionality with this proposal.

But per the other thread, I wouldn't be opposed to adding a global
per-node interface in addition to the cgroupfs one.