Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

Michal Hocko <mhocko@xxxxxxxxxx> · Tue, 7 Aug 2018 13:09:51 +0200

On Tue 07-08-18 14:02:00, Georgi Nikolov wrote:
> On 08/06/2018 11:42 AM, Georgi Nikolov wrote:
> > On 08/02/2018 11:50 AM, Michal Hocko wrote:
> >> In other words, why don't we simply do the following? Note that this is
> >> not tested. I have also no idea what is the lifetime of this allocation.
> >> Is it bound to any specific process or is it a namespace bound? If the
> >> later then the memcg OOM killer might wipe the whole memcg down without
> >> making any progress. This would make the whole namespace unsuable until
> >> somebody intervenes. Is this acceptable?
> >> ---
> >> From 4dec96eb64954a7e58264ed551afadf62ca4c5f7 Mon Sep 17 00:00:00 2001
> >> From: Michal Hocko <mhocko@xxxxxxxx>
> >> Date: Thu, 2 Aug 2018 10:38:57 +0200
> >> Subject: [PATCH] netfilter/x_tables: do not fail xt_alloc_table_info too
> >>  easilly
> >>
> >> eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc()
> >> in xt_alloc_table_info()") has unintentionally fortified
> >> xt_alloc_table_info allocation when __GFP_RETRY has been dropped from
> >> the vmalloc fallback. Later on there was a syzbot report that this
> >> can lead to OOM killer invocations when tables are too large and
> >> 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive")
> >> has been merged to restore the original behavior. Georgi Nikolov however
> >> noticed that he is not able to install his iptables anymore so this can
> >> be seen as a regression.
> >>
> >> The primary argument for 0537250fdc6c was that this allocation path
> >> shouldn't really trigger the OOM killer and kill innocent tasks. On the
> >> other hand the interface requires root and as such should allow what the
> >> admin asks for. Root inside a namespaces makes this more complicated
> >> because those might be not trusted in general. If they are not then such
> >> namespaces should be restricted anyway. Therefore drop the __GFP_NORETRY
> >> and replace it by __GFP_ACCOUNT to enfore memcg constrains on it.
> >>
> >> Fixes: 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive")
> >> Reported-by: Georgi Nikolov <gnikolov@xxxxxxxxxxx>
> >> Suggested-by: Vlastimil Babka <vbabka@xxxxxxx>
> >> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
> >> ---
> >>  net/netfilter/x_tables.c | 7 +------
> >>  1 file changed, 1 insertion(+), 6 deletions(-)
> >>
> >> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> >> index d0d8397c9588..b769408e04ab 100644
> >> --- a/net/netfilter/x_tables.c
> >> +++ b/net/netfilter/x_tables.c
> >> @@ -1178,12 +1178,7 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
> >>  	if (sz < sizeof(*info) || sz >= XT_MAX_TABLE_SIZE)
> >>  		return NULL;
> >>  
> >> -	/* __GFP_NORETRY is not fully supported by kvmalloc but it should
> >> -	 * work reasonably well if sz is too large and bail out rather
> >> -	 * than shoot all processes down before realizing there is nothing
> >> -	 * more to reclaim.
> >> -	 */
> >> -	info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
> >> +	info = kvmalloc(sz, GFP_KERNEL | __GFP_ACCOUNT);
> >>  	if (!info)
> >>  		return NULL;
> >>  
> > I will check if this change fixes the problem.
> >
> > Regards,
> >
> > --
> > Georgi Nikolov
> 
> I can't reproduce it anymore.
> If i understand correctly this way memory allocated will be
> accounted to kmem of this cgroup (if inside cgroup).

s@this@caller's@

Florian, is this patch acceptable?

-- 
Michal Hocko
SUSE Labs