On Tue, 8 Mar 2022, Michal Hocko wrote: > > Let me take a stab at this. The specific reasons why high limit is not a > > good interface to implement proactive reclaim: > > > > 1) It can cause allocations from the target application to get > > throttled. > > > > 2) It leaves a state (high limit) in the kernel which needs to be reset > > by the userspace part of proactive reclaimer. > > > > If I remember correctly, Facebook actually tried to use high limit to > > implement the proactive reclaim but due to exactly these limitations [1] > > they went the route [2] aligned with this proposal. > > I do remember we have discussed this in the past. There were proposals > for an additional limit to trigger a background reclaim [3] or to add a > pressure based memcg knob [4]. For the nr_to_reclaim based interface > there were some challenges outlined in that email thread. I do > understand that practical experience could have confirmed or diminished > those concerns. > > I am definitely happy to restart those discussion but it would be really > great to summarize existing options and why they do not work in > practice. It would be also great to mention why concerns about nr_to_reclaim > based interface expressed in the past are not standing out anymore wrt. > other proposals. > Johannes, since you had pointed out that the current approach used at Meta and described in the TMO paper works well in practice and is based on prior discussions of memory.reclaim[1], do you have any lingering concerns from that 2020 thread? My first email in this thread proposes something that can still do memcg based reclaim but is also possible even without CONFIG_MEMCG enabled. That's particularly helpful for configs used by customers that don't use memcg, namely Chrome OS. I assume we're not losing any functionality that your use case depends on if we are to introduce a per-node sysfs mechanism for this as an alternative since you can still specify a memcg id? [1] https://lkml.org/lkml/2020/9/9/1094