On Wed 11-09-19 07:37:40, Andrew Morton wrote: > On Wed, 11 Sep 2019 14:00:02 +0200 Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > On Mon 09-09-19 13:22:45, Michal Hocko wrote: > > > On Fri 06-09-19 11:24:55, Shakeel Butt wrote: > > [...] > > > > I wonder what has changed since > > > > <http://lkml.kernel.org/r/20180525185501.82098-1-shakeelb@xxxxxxxxxx/>. > > > > > > I have completely forgot about that one. It seems that we have just > > > repeated the same discussion again. This time we have a poor user who > > > actually enabled the kmem limit. > > > > > > I guess there was no real objection to the change back then. The primary > > > discussion revolved around the fact that the accounting will stay broken > > > even when this particular part was fixed. Considering this leads to easy > > > to trigger crash (with the limit enabled) then I guess we should just > > > make it less broken and backport to stable trees and have a serious > > > discussion about discontinuing of the limit. Start by simply failing to > > > set any limit in the current upstream kernels. > > > > Any more concerns/objections to the patch? I can add a reference to your > > earlier post Shakeel if you want or to credit you the way you prefer. > > > > Also are there any objections to start deprecating process of kmem > > limit? I would see it in two stages > > - 1st warn in the kernel log > > pr_warn("kmem.limit_in_bytes is deprecated and will be removed. > > "Please report your usecase to linux-mm@xxxxxxxxx if you " > > "depend on this functionality." > > pr_warn_once() :) > > > - 2nd fail any write to kmem.limit_in_bytes > > - 3rd remove the control file completely > > Sounds good to me. Here we go >From 512822e551fe2960040c23b12c7b27a5fdab9013 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@xxxxxxxx> Date: Wed, 11 Sep 2019 17:02:33 +0200 Subject: [PATCH] memcg, kmem: deprecate kmem.limit_in_bytes Cgroup v1 memcg controller has exposed a dedicated kmem limit to users which turned out to be really a bad idea because there are paths which cannot shrink the kernel memory usage enough to get below the limit (e.g. because the accounted memory is not reclaimable). There are cases when the failure is even not allowed (e.g. __GFP_NOFAIL). This means that the kmem limit is in excess to the hard limit without any way to shrink and thus completely useless. OOM killer cannot be invoked to handle the situation because that would lead to a premature oom killing. As a result many places might see ENOMEM returning from kmalloc and result in unexpected errors. E.g. a global OOM killer when there is a lot of free memory because ENOMEM is translated into VM_FAULT_OOM in #PF path and therefore pagefault_out_of_memory would result in OOM killer. Please note that the kernel memory is still accounted to the overall limit along with the user memory so removing the kmem specific limit should still allow to contain kernel memory consumption. Unlike the kmem one, though, it invokes memory reclaim and targeted memcg oom killing if necessary. Start the deprecation process by crying to the kernel log. Let's see whether there are relevant usecases and simply return to EINVAL in the second stage if nobody complains in few releases. Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> --- Documentation/admin-guide/cgroup-v1/memory.rst | 3 +++ mm/memcontrol.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 41bdc038dad9..e53fc2f31549 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -87,6 +87,9 @@ Brief summary of control files. node memory.kmem.limit_in_bytes set/show hard limit for kernel memory + This knob is deprecated it shouldn't be + used. It is planned to be removed in + a foreseeable future. memory.kmem.usage_in_bytes show current kernel memory allocation memory.kmem.failcnt show the number of kernel memory usage hits limits diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e18108b2b786..113969bc57e8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3518,6 +3518,9 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of, ret = mem_cgroup_resize_max(memcg, nr_pages, true); break; case _KMEM: + pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. " + "Please report your usecase to linux-mm@xxxxxxxxx if you " + "depend on this functionality.\n"); ret = memcg_update_kmem_max(memcg, nr_pages); break; case _TCP: -- 2.20.1 -- Michal Hocko SUSE Labs