On Thu, Mar 10, 2022 at 8:58 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Wed, Mar 09, 2022 at 02:03:21PM -0800, David Rientjes wrote: > > On Tue, 8 Mar 2022, Michal Hocko wrote: > > > > > > Let me take a stab at this. The specific reasons why high limit is not a > > > > good interface to implement proactive reclaim: > > > > > > > > 1) It can cause allocations from the target application to get > > > > throttled. > > > > > > > > 2) It leaves a state (high limit) in the kernel which needs to be reset > > > > by the userspace part of proactive reclaimer. > > > > > > > > If I remember correctly, Facebook actually tried to use high limit to > > > > implement the proactive reclaim but due to exactly these limitations [1] > > > > they went the route [2] aligned with this proposal. > > > > > > I do remember we have discussed this in the past. There were proposals > > > for an additional limit to trigger a background reclaim [3] or to add a > > > pressure based memcg knob [4]. For the nr_to_reclaim based interface > > > there were some challenges outlined in that email thread. I do > > > understand that practical experience could have confirmed or diminished > > > those concerns. > > > > > > I am definitely happy to restart those discussion but it would be really > > > great to summarize existing options and why they do not work in > > > practice. It would be also great to mention why concerns about nr_to_reclaim > > > based interface expressed in the past are not standing out anymore wrt. > > > other proposals. > > > > > > > Johannes, since you had pointed out that the current approach used at Meta > > and described in the TMO paper works well in practice and is based on > > prior discussions of memory.reclaim[1], do you have any lingering concerns > > from that 2020 thread? > > I'd be okay with merging the interface proposed in that thread as-is. We will need a nodemask argument for the memory tiering use case. We can add it as an optional argument to memory.reclaim later. Or do you think we should add a different interface (e.g. memory.demote) for memory tiering instead? > > My first email in this thread proposes something that can still do memcg > > based reclaim but is also possible even without CONFIG_MEMCG enabled. > > That's particularly helpful for configs used by customers that don't use > > memcg, namely Chrome OS. I assume we're not losing any functionality that > > your use case depends on if we are to introduce a per-node sysfs mechanism > > for this as an alternative since you can still specify a memcg id? > > We'd lose the delegation functionality with this proposal. > > But per the other thread, I wouldn't be opposed to adding a global > per-node interface in addition to the cgroupfs one.