Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

Vlastimil Babka <vbabka@xxxxxxx> · Tue, 31 Jul 2018 08:38:00 +0200

On 07/30/2018 08:51 PM, Georgi Nikolov wrote:
> On 07/30/2018 09:38 PM, Michal Hocko wrote:
>> On Mon 30-07-18 18:54:24, Georgi Nikolov wrote:
>> [...]
>>> No i was wrong. The regression starts actually with 0537250fdc6c8.
>>> - old code, which opencodes kvmalloc, is masking error but error is there
>>> - kvmalloc without GFP_NORETRY works fine, but probably can consume a
>>> lot of memory - commit: eacd86ca3b036
>>> - kvmalloc with GFP_NORETRY shows error - commit: 0537250fdc6c8
>> OK.
>>
>>>>> What is correct way to fix it.
>>>>> - inside xt_alloc_table_info remove GFP_NORETRY from kvmalloc or add
>>>>> this flag only for sizes bigger than some threshold
>>>> This would reintroduce issue fixed by 0537250fdc6c8. Note that
>>>> kvmalloc(GFP_KERNEL | __GFP_NORETRY) is more or less equivalent to the
>>>> original code (well, except for __GFP_NOWARN).
>>> So probably we should pass GFP_NORETRY only for large requests (above
>>> some threshold).
>> What would be the treshold? This is not really my area so I just wanted
>> to keep the original code semantic.
>>  
>>>>> - inside kvmalloc_node remove GFP_NORETRY from
>>>>> __vmalloc_node_flags_caller (i don't know if it honors this flag, or
>>>>> the problem is elsewhere)
>>>> No, not really. This is basically equivalent to kvmalloc(GFP_KERNEL).
>>>>
>>>> I strongly suspect that this is not a regression in this code but rather
>>>> a side effect of larger memory fragmentation caused by something else.
>>>> In any case do you see this failure also without artificial test case
>>>> with a standard workload?
>>> Yes i can see failures with standard workload, in fact it was hard to
>>> reproduce it.
>>> Here is the error from production servers where allocation is smaller:
>>> iptables: vmalloc: allocation failure, allocated 131072 of 225280 bytes,
>>> mode:0x14010c0(GFP_KERNEL|__GFP_NORETRY), nodemask=(null)
>>>
>>> I didn't understand if vmalloc honors GFP_NORETRY.
>> 0537250fdc6c8 changelog tries to explain. kvmalloc doesn't really
>> support the GFP_NORETRY remantic because that would imply the request
>> wouldn't trigger the oom killer but in rare cases this might happen
>> (e.g. when page tables are allocated because those are hardcoded GFP_KERNEL).
>>
>> That being said, I have no objection to use GFP_KERNEL if it helps real
>> workloads but we probably need some cap...
> 
> Probably Vlastimil Babka can propose some limit:

No, I think that's rather for the netfilter folks to decide. However, it
seems there has been the debate already [1] and it was not found. The
conclusion was that __GFP_NORETRY worked fine before, so it should work
again after it's added back. But now we know that it doesn't...

[1] https://lore.kernel.org/lkml/20180130140104.GE21609@xxxxxxxxxxxxxx/T/#u

> On Thu 26-07-18 09:18:57, Vlastimil Babka wrote:
> This is likely the kvmalloc() in xt_alloc_table_info(). Between 4.13 and
> 4.17 it shouldn't use __GFP_NORETRY, but looks like commit 0537250fdc6c
> ("netfilter: x_tables: make allocation less aggressive") was backported
> to 4.14. Removing __GFP_NORETRY might help here, but bring back other
> issues. Less than 4MB is not that much though, maybe find some "sane"
> limit and use __GFP_NORETRY only above that?
> 
> 
> Regards,
> 
> --
> Georgi Nikolov
> 
>