Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

Georgi Nikolov <gnikolov@xxxxxxxxxxx> · Tue, 7 Aug 2018 14:02:00 +0300

On 08/06/2018 11:42 AM, Georgi Nikolov wrote:
> On 08/02/2018 11:50 AM, Michal Hocko wrote:
>> In other words, why don't we simply do the following? Note that this is
>> not tested. I have also no idea what is the lifetime of this allocation.
>> Is it bound to any specific process or is it a namespace bound? If the
>> later then the memcg OOM killer might wipe the whole memcg down without
>> making any progress. This would make the whole namespace unsuable until
>> somebody intervenes. Is this acceptable?
>> ---
>> From 4dec96eb64954a7e58264ed551afadf62ca4c5f7 Mon Sep 17 00:00:00 2001
>> From: Michal Hocko <mhocko@xxxxxxxx>
>> Date: Thu, 2 Aug 2018 10:38:57 +0200
>> Subject: [PATCH] netfilter/x_tables: do not fail xt_alloc_table_info too
>>  easilly
>>
>> eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc()
>> in xt_alloc_table_info()") has unintentionally fortified
>> xt_alloc_table_info allocation when __GFP_RETRY has been dropped from
>> the vmalloc fallback. Later on there was a syzbot report that this
>> can lead to OOM killer invocations when tables are too large and
>> 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive")
>> has been merged to restore the original behavior. Georgi Nikolov however
>> noticed that he is not able to install his iptables anymore so this can
>> be seen as a regression.
>>
>> The primary argument for 0537250fdc6c was that this allocation path
>> shouldn't really trigger the OOM killer and kill innocent tasks. On the
>> other hand the interface requires root and as such should allow what the
>> admin asks for. Root inside a namespaces makes this more complicated
>> because those might be not trusted in general. If they are not then such
>> namespaces should be restricted anyway. Therefore drop the __GFP_NORETRY
>> and replace it by __GFP_ACCOUNT to enfore memcg constrains on it.
>>
>> Fixes: 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive")
>> Reported-by: Georgi Nikolov <gnikolov@xxxxxxxxxxx>
>> Suggested-by: Vlastimil Babka <vbabka@xxxxxxx>
>> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
>> ---
>>  net/netfilter/x_tables.c | 7 +------
>>  1 file changed, 1 insertion(+), 6 deletions(-)
>>
>> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
>> index d0d8397c9588..b769408e04ab 100644
>> --- a/net/netfilter/x_tables.c
>> +++ b/net/netfilter/x_tables.c
>> @@ -1178,12 +1178,7 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
>>  	if (sz < sizeof(*info) || sz >= XT_MAX_TABLE_SIZE)
>>  		return NULL;
>>  
>> -	/* __GFP_NORETRY is not fully supported by kvmalloc but it should
>> -	 * work reasonably well if sz is too large and bail out rather
>> -	 * than shoot all processes down before realizing there is nothing
>> -	 * more to reclaim.
>> -	 */
>> -	info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
>> +	info = kvmalloc(sz, GFP_KERNEL | __GFP_ACCOUNT);
>>  	if (!info)
>>  		return NULL;
>>  
> I will check if this change fixes the problem.
>
> Regards,
>
> --
> Georgi Nikolov

I can't reproduce it anymore.
If i understand correctly this way memory allocated will be
accounted to kmem of this cgroup (if inside cgroup).

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html