On Tue, Apr 23, 2019 at 9:08 AM Rik van Riel <riel@xxxxxxxxxxx> wrote: > > On Tue, 2019-04-23 at 08:30 -0700, Shakeel Butt wrote: > > > Topic: Proactive Memory Reclaim > > > > Motivation/Problem: Memory overcommit is most commonly used technique > > to reduce the cost of memory by large infrastructure owners. However > > memory overcommit can adversely impact the performance of latency > > sensitive applications by triggering direct memory reclaim. Direct > > reclaim is unpredictable and disastrous for latency sensitive > > applications. > > This sounds similar to a project Johannes has > been working on, except he is not tracking which > memory is idle at all, but only the pressure on > each cgroup, through the PSI interface: > > https://facebookmicrosites.github.io/psi/docs/overview > I think both techniques are orthogonal and can be used concurrently. This technique proactively reclaims memory and hopes that we don't go to direct reclaim but in the worst case if we trigger direct reclaim then we can use PSI to early detect when to give up on reclaim and trigger oom-kill. Another thing I want to point out is our usage model: this proactive memory reclaim is transparent to the jobs. The admin (infrastructure owner) is using proactive reclaim to create more schedulable memory transparently to the job owners. > Discussing the pros and cons, and experiences with > both approaches seems like a useful topic. I'll add > it to the agenda. > Thanks a lot. Shakeel