On Tue, Mar 8, 2022 at 8:05 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Tue 08-03-22 09:44:35, Dan Schatzberg wrote: > > On Tue, Mar 08, 2022 at 01:53:19PM +0100, Michal Hocko wrote: > > > On Mon 07-03-22 15:26:18, Johannes Weiner wrote: > [...] > > > > A mechanism to request a fixed number of pages to reclaim turned out > > > > to work much, much better in practice. We've been using a simple > > > > per-cgroup knob (like here: https://lkml.org/lkml/2020/9/9/1094). > > > > > > Could you share more details here please? How have you managed to find > > > the reclaim target and how have you overcome challenges to react in time > > > to have some head room for the actual reclaim? > > > > We have a userspace agent that just repeatedly triggers proactive > > reclaim and monitors PSI metrics to maintain some constant but low > > pressure. In the complete absense of pressure we will reclaim some > > configurable percentage of the workload's memory. This reclaim amount > > tapers down to zero as PSI approaches the target threshold. > > > > I don't follow your question regarding head-room. Could you elaborate? > > One of the concern that was expressed in the past is how effectively > can pro-active userspace reclaimer act on memory demand transitions. It > takes some time to get refaults/PSI changes and then you should > be acting rather swiftly. At least if you aim at somehow smooth > transition. Tuning this up to work reliably seems to be far > from trivial. Not to mention that changes in the memory reclaim > implementation could make the whole tuning rather fragile. The userspace reclaimer is not a complete replacement of the kernel memory reclaim (kswapd or direct reclaim). At least in Google's user cases, it is to proactively identify memory savings opportunities and reclaim some amount of cold pages set by the policy to free up the memory for more demanding jobs or scheduling new jobs. If a job (container) has a rapid memory demand increase, it would just mean less proactive savings from this job. The userspace reclaimer doesn't have to act much more swiftly for such jobs with the proposed nr_bytes_to_reclaim interface. If the userspace reclaim interface was memory.high-based, then such jobs would indeed be a serious problem.