On Tue 07-08-18 14:02:00, Georgi Nikolov wrote: > On 08/06/2018 11:42 AM, Georgi Nikolov wrote: > > On 08/02/2018 11:50 AM, Michal Hocko wrote: > >> In other words, why don't we simply do the following? Note that this is > >> not tested. I have also no idea what is the lifetime of this allocation. > >> Is it bound to any specific process or is it a namespace bound? If the > >> later then the memcg OOM killer might wipe the whole memcg down without > >> making any progress. This would make the whole namespace unsuable until > >> somebody intervenes. Is this acceptable? > >> --- > >> From 4dec96eb64954a7e58264ed551afadf62ca4c5f7 Mon Sep 17 00:00:00 2001 > >> From: Michal Hocko <mhocko@xxxxxxxx> > >> Date: Thu, 2 Aug 2018 10:38:57 +0200 > >> Subject: [PATCH] netfilter/x_tables: do not fail xt_alloc_table_info too > >> easilly > >> > >> eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() > >> in xt_alloc_table_info()") has unintentionally fortified > >> xt_alloc_table_info allocation when __GFP_RETRY has been dropped from > >> the vmalloc fallback. Later on there was a syzbot report that this > >> can lead to OOM killer invocations when tables are too large and > >> 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive") > >> has been merged to restore the original behavior. Georgi Nikolov however > >> noticed that he is not able to install his iptables anymore so this can > >> be seen as a regression. > >> > >> The primary argument for 0537250fdc6c was that this allocation path > >> shouldn't really trigger the OOM killer and kill innocent tasks. On the > >> other hand the interface requires root and as such should allow what the > >> admin asks for. Root inside a namespaces makes this more complicated > >> because those might be not trusted in general. If they are not then such > >> namespaces should be restricted anyway. Therefore drop the __GFP_NORETRY > >> and replace it by __GFP_ACCOUNT to enfore memcg constrains on it. > >> > >> Fixes: 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive") > >> Reported-by: Georgi Nikolov <gnikolov@xxxxxxxxxxx> > >> Suggested-by: Vlastimil Babka <vbabka@xxxxxxx> > >> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> > >> --- > >> net/netfilter/x_tables.c | 7 +------ > >> 1 file changed, 1 insertion(+), 6 deletions(-) > >> > >> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > >> index d0d8397c9588..b769408e04ab 100644 > >> --- a/net/netfilter/x_tables.c > >> +++ b/net/netfilter/x_tables.c > >> @@ -1178,12 +1178,7 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size) > >> if (sz < sizeof(*info) || sz >= XT_MAX_TABLE_SIZE) > >> return NULL; > >> > >> - /* __GFP_NORETRY is not fully supported by kvmalloc but it should > >> - * work reasonably well if sz is too large and bail out rather > >> - * than shoot all processes down before realizing there is nothing > >> - * more to reclaim. > >> - */ > >> - info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY); > >> + info = kvmalloc(sz, GFP_KERNEL | __GFP_ACCOUNT); > >> if (!info) > >> return NULL; > >> > > I will check if this change fixes the problem. > > > > Regards, > > > > -- > > Georgi Nikolov > > I can't reproduce it anymore. > If i understand correctly this way memory allocated will be > accounted to kmem of this cgroup (if inside cgroup). s@this@caller's@ Florian, is this patch acceptable? -- Michal Hocko SUSE Labs