The patch titled Subject: memcg: schedule high reclaim for remote memcgs on high_work has been added to the -mm tree. Its filename is memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Shakeel Butt <shakeelb@xxxxxxxxxx> Subject: memcg: schedule high reclaim for remote memcgs on high_work If a memcg is over high limit, memory reclaim is scheduled to run on return-to-userland. However it is assumed that the memcg is the current process's memcg. With remote memcg charging for kmem or swapping in a page charged to remote memcg, current process can trigger reclaim on remote memcg. So, scheduling reclaim on return-to-userland for remote memcgs will ignore the high reclaim altogether. So, punt the high reclaim of remote memcgs to high_work. Link: http://lkml.kernel.org/r/20190103015638.205424-1-shakeelb@xxxxxxxxxx Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/memcontrol.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) --- a/mm/memcontrol.c~memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work +++ a/mm/memcontrol.c @@ -2318,19 +2318,23 @@ done_restock: * reclaim on returning to userland. We can perform reclaim here * if __GFP_RECLAIM but let's always punt for simplicity and so that * GFP_KERNEL can consistently be used during reclaim. @memcg is - * not recorded as it most likely matches current's and won't - * change in the meantime. As high limit is checked again before - * reclaim, the cost of mismatch is negligible. + * not recorded as the return-to-userland high reclaim will only reclaim + * from current's memcg (or its ancestor). For other memcgs we punt them + * to work queue. */ do { if (page_counter_read(&memcg->memory) > memcg->high) { - /* Don't bother a random interrupted task */ - if (in_interrupt()) { + /* + * Don't bother a random interrupted task or if the + * memcg is not current's memcg's ancestor. + */ + if (in_interrupt() || + !mm_match_cgroup(current->mm, memcg)) { schedule_work(&memcg->high_work); - break; + } else { + current->memcg_nr_pages_over_high += batch; + set_notify_resume(current); } - current->memcg_nr_pages_over_high += batch; - set_notify_resume(current); break; } } while ((memcg = parent_mem_cgroup(memcg))); _ Patches currently in -mm which might be from shakeelb@xxxxxxxxxx are fork-memcg-fix-cached_stacks-case.patch memcg-localize-memcg_kmem_enabled-check.patch memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work.patch