Hi, Tejun! On Sun, Apr 15, 2018 at 06:39:02PM -0700, Tejun Heo wrote: > Hello, Roman. > > The reclaim behavior is a bit worrisome. > > * It disables an entire swap area while reclaim is in progress. Most > systems only have one swap area, so this would disable allocating > new swap area for everyone. > > * The reclaim seems very inefficient. IIUC, it has to read every swap > page to see whether the page belongs to the target memcg and for > each matching page, which involves walking page mm's and page > tables. > > An easy optimization would be walking swap_cgroup_ctrl so that it only > reads swap entries which belong to the target cgroup and avoid > disabling swap for others, but looking at the code, I wonder whether > we need active reclaim at all. > > Swap already tries to aggressively reclaim swap entries when swap > usage > 50% of the limit, so simply reducing the limit already > triggers aggressive reclaim, and given that it's swap, just waiting it > out could be the better behavior anyway, so how about something like > the following? > > ------ 8< ------ > From: Tejun Heo <tj@xxxxxxxxxx> > Subject: mm: memcg: allow lowering memory.swap.max below the current usage > > Currently an attempt to set swap.max into a value lower than the > actual swap usage fails, which causes configuration problems as > there's no way of lowering the configuration below the current usage > short of turning off swap entirely. This makes swap.max difficult to > use and allows delegatees to lock the delegator out of reducing swap > allocation. > > This patch updates swap_max_write() so that the limit can be lowered > below the current usage. It doesn't implement active reclaiming of > swap entries for the following reasons. This is definitely better than existing state of things, and it's also safe. I assume, that active swap reclaim can be useful in some cases, but we can return to this question later. Acked-by: Roman Gushchin <guro@xxxxxx> > > * mem_cgroup_swap_full() already tells the swap machinary to > aggressively reclaim swap entries if the usage is above 50% of > limit, so simply lowering the limit automatically triggers gradual > reclaim. > > * Forcing back swapped out pages is likely to heavily impact the > workload and mess up the working set. Given that swap usually is a > lot less valuable and less scarce, letting the existing usage > dissipate over time through the above gradual reclaim and as they're > falted back in is likely the better behavior. > > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx> > Cc: Roman Gushchin <guro@xxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > Cc: Shaohua Li <shli@xxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx > Cc: linux-mm@xxxxxxxxx > Cc: cgroups@xxxxxxxxxxxxxxx > --- > Documentation/cgroup-v2.txt | 5 +++++ > mm/memcontrol.c | 6 +----- > 2 files changed, 6 insertions(+), 5 deletions(-) > > --- a/Documentation/cgroup-v2.txt > +++ b/Documentation/cgroup-v2.txt > @@ -1199,6 +1199,11 @@ PAGE_SIZE multiple when read back. > Swap usage hard limit. If a cgroup's swap usage reaches this > limit, anonymous memory of the cgroup will not be swapped out. > > + When reduced under the current usage, the existing swap > + entries are reclaimed gradually and the swap usage may stay > + higher than the limit for an extended period of time. This > + reduces the impact on the workload and memory management. I would probably drop the last sentence: it looks like an excuse for the defined semantics; but it's totally fine. Thanks!