On Wed, Mar 09, 2022 at 02:03:21PM -0800, David Rientjes wrote: > On Tue, 8 Mar 2022, Michal Hocko wrote: > > > > Let me take a stab at this. The specific reasons why high limit is not a > > > good interface to implement proactive reclaim: > > > > > > 1) It can cause allocations from the target application to get > > > throttled. > > > > > > 2) It leaves a state (high limit) in the kernel which needs to be reset > > > by the userspace part of proactive reclaimer. > > > > > > If I remember correctly, Facebook actually tried to use high limit to > > > implement the proactive reclaim but due to exactly these limitations [1] > > > they went the route [2] aligned with this proposal. > > > > I do remember we have discussed this in the past. There were proposals > > for an additional limit to trigger a background reclaim [3] or to add a > > pressure based memcg knob [4]. For the nr_to_reclaim based interface > > there were some challenges outlined in that email thread. I do > > understand that practical experience could have confirmed or diminished > > those concerns. > > > > I am definitely happy to restart those discussion but it would be really > > great to summarize existing options and why they do not work in > > practice. It would be also great to mention why concerns about nr_to_reclaim > > based interface expressed in the past are not standing out anymore wrt. > > other proposals. > > > > Johannes, since you had pointed out that the current approach used at Meta > and described in the TMO paper works well in practice and is based on > prior discussions of memory.reclaim[1], do you have any lingering concerns > from that 2020 thread? I'd be okay with merging the interface proposed in that thread as-is. > My first email in this thread proposes something that can still do memcg > based reclaim but is also possible even without CONFIG_MEMCG enabled. > That's particularly helpful for configs used by customers that don't use > memcg, namely Chrome OS. I assume we're not losing any functionality that > your use case depends on if we are to introduce a per-node sysfs mechanism > for this as an alternative since you can still specify a memcg id? We'd lose the delegation functionality with this proposal. But per the other thread, I wouldn't be opposed to adding a global per-node interface in addition to the cgroupfs one.