The patch titled Subject: memcg: synchronously enforce memory.high for large overcharges has been added to the -mm tree. Its filename is memcg-synchronously-enforce-memoryhigh-for-large-overcharges.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/memcg-synchronously-enforce-memoryhigh-for-large-overcharges.patch and later at https://ozlabs.org/~akpm/mmotm/broken-out/memcg-synchronously-enforce-memoryhigh-for-large-overcharges.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Shakeel Butt <shakeelb@xxxxxxxxxx> Subject: memcg: synchronously enforce memory.high for large overcharges The high limit is used to throttle the workload without invoking the oom-killer. Recently we tried to use the high limit to right size our internal workloads. More specifically dynamically adjusting the limits of the workload without letting the workload get oom-killed. However due to the limitation of the implementation of high limit enforcement, we observed the mechanism fails for some real workloads. The high limit is enforced on return-to-userspace i.e. the kernel let the usage goes over the limit and when the execution returns to userspace, the high reclaim is triggered and the process can get throttled as well. However this mechanism fails for workloads which do large allocations in a single kernel entry e.g. applications that mlock() a large chunk of memory in a single syscall. Such applications bypass the high limit and can trigger the oom-killer. To make high limit enforcement more robust, this patch makes the limit enforcement synchronous only if the accumulated overcharge becomes larger than MEMCG_CHARGE_BATCH. So, most of the allocations would still be throttled on the return-to-userspace path but only the extreme allocations which accumulates large amount of overcharge without returning to the userspace will be throttled synchronously. The value MEMCG_CHARGE_BATCH is a bit arbitrary but most of other places in the memcg codebase uses this constant therefore for now uses the same one. Link: https://lkml.kernel.org/r/20220211064917.2028469-5-shakeelb@xxxxxxxxxx Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx> Reviewed-by: Roman Gushchin <guro@xxxxxx> Acked-by: Chris Down <chris@xxxxxxxxxxxxxx> Cc: Roman Gushchin <roman.gushchin@xxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/memcontrol.c | 5 +++++ 1 file changed, 5 insertions(+) --- a/mm/memcontrol.c~memcg-synchronously-enforce-memoryhigh-for-large-overcharges +++ a/mm/memcontrol.c @@ -2704,6 +2704,11 @@ done_restock: } } while ((memcg = parent_mem_cgroup(memcg))); + if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH && + !(current->flags & PF_MEMALLOC) && + gfpflags_allow_blocking(gfp_mask)) { + mem_cgroup_handle_over_high(); + } return 0; } _ Patches currently in -mm which might be from shakeelb@xxxxxxxxxx are memcg-replace-in_interrupt-with-in_task.patch memcg-refactor-mem_cgroup_oom.patch memcg-unify-force-charging-conditions.patch selftests-memcg-test-high-limit-for-single-entry-allocation.patch memcg-synchronously-enforce-memoryhigh-for-large-overcharges.patch