Re: [RFC] Mechanism to induce memory reclaim

David Rientjes <rientjes@xxxxxxxxxx> · Wed, 9 Mar 2022 14:03:21 -0800 (PST)

On Tue, 8 Mar 2022, Michal Hocko wrote:

> > Let me take a stab at this. The specific reasons why high limit is not a
> > good interface to implement proactive reclaim:
> > 
> > 1) It can cause allocations from the target application to get
> > throttled.
> > 
> > 2) It leaves a state (high limit) in the kernel which needs to be reset
> > by the userspace part of proactive reclaimer.
> > 
> > If I remember correctly, Facebook actually tried to use high limit to
> > implement the proactive reclaim but due to exactly these limitations [1]
> > they went the route [2] aligned with this proposal.
> 
> I do remember we have discussed this in the past. There were proposals
> for an additional limit to trigger a background reclaim [3] or to add a
> pressure based memcg knob [4]. For the nr_to_reclaim based interface
> there were some challenges outlined in that email thread. I do
> understand that practical experience could have confirmed or diminished
> those concerns.
> 
> I am definitely happy to restart those discussion but it would be really
> great to summarize existing options and why they do not work in
> practice. It would be also great to mention why concerns about nr_to_reclaim
> based interface expressed in the past are not standing out anymore wrt.
> other proposals.
> 

Johannes, since you had pointed out that the current approach used at Meta 
and described in the TMO paper works well in practice and is based on 
prior discussions of memory.reclaim[1], do you have any lingering concerns 
from that 2020 thread?

My first email in this thread proposes something that can still do memcg 
based reclaim but is also possible even without CONFIG_MEMCG enabled.  
That's particularly helpful for configs used by customers that don't use 
memcg, namely Chrome OS.  I assume we're not losing any functionality that 
your use case depends on if we are to introduce a per-node sysfs mechanism 
for this as an alternative since you can still specify a memcg id?

[1] https://lkml.org/lkml/2020/9/9/1094