Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

Georgi Nikolov <gnikolov@xxxxxxxxxxx> · Tue, 31 Jul 2018 16:55:26 +0300

On 07/31/2018 09:38 AM, Vlastimil Babka wrote:
> On 07/30/2018 08:51 PM, Georgi Nikolov wrote:
>> On 07/30/2018 09:38 PM, Michal Hocko wrote:
>>> On Mon 30-07-18 18:54:24, Georgi Nikolov wrote:
>>> [...]
>>>> No i was wrong. The regression starts actually with 0537250fdc6c8.
>>>> - old code, which opencodes kvmalloc, is masking error but error is there
>>>> - kvmalloc without GFP_NORETRY works fine, but probably can consume a
>>>> lot of memory - commit: eacd86ca3b036
>>>> - kvmalloc with GFP_NORETRY shows error - commit: 0537250fdc6c8
>>> OK.
>>>
>>>>>> What is correct way to fix it.
>>>>>> - inside xt_alloc_table_info remove GFP_NORETRY from kvmalloc or add
>>>>>> this flag only for sizes bigger than some threshold
>>>>> This would reintroduce issue fixed by 0537250fdc6c8. Note that
>>>>> kvmalloc(GFP_KERNEL | __GFP_NORETRY) is more or less equivalent to the
>>>>> original code (well, except for __GFP_NOWARN).
>>>> So probably we should pass GFP_NORETRY only for large requests (above
>>>> some threshold).
>>> What would be the treshold? This is not really my area so I just wanted
>>> to keep the original code semantic.
>>>  
>>>>>> - inside kvmalloc_node remove GFP_NORETRY from
>>>>>> __vmalloc_node_flags_caller (i don't know if it honors this flag, or
>>>>>> the problem is elsewhere)
>>>>> No, not really. This is basically equivalent to kvmalloc(GFP_KERNEL).
>>>>>
>>>>> I strongly suspect that this is not a regression in this code but rather
>>>>> a side effect of larger memory fragmentation caused by something else.
>>>>> In any case do you see this failure also without artificial test case
>>>>> with a standard workload?
>>>> Yes i can see failures with standard workload, in fact it was hard to
>>>> reproduce it.
>>>> Here is the error from production servers where allocation is smaller:
>>>> iptables: vmalloc: allocation failure, allocated 131072 of 225280 bytes,
>>>> mode:0x14010c0(GFP_KERNEL|__GFP_NORETRY), nodemask=(null)
>>>>
>>>> I didn't understand if vmalloc honors GFP_NORETRY.
>>> 0537250fdc6c8 changelog tries to explain. kvmalloc doesn't really
>>> support the GFP_NORETRY remantic because that would imply the request
>>> wouldn't trigger the oom killer but in rare cases this might happen
>>> (e.g. when page tables are allocated because those are hardcoded GFP_KERNEL).
>>>
>>> That being said, I have no objection to use GFP_KERNEL if it helps real
>>> workloads but we probably need some cap...
>> Probably Vlastimil Babka can propose some limit:
> No, I think that's rather for the netfilter folks to decide. However, it
> seems there has been the debate already [1] and it was not found. The
> conclusion was that __GFP_NORETRY worked fine before, so it should work
> again after it's added back. But now we know that it doesn't...
>
> [1] https://lore.kernel.org/lkml/20180130140104.GE21609@xxxxxxxxxxxxxx/T/#u

Yes i see. I will add Florian Westphal to CC list. netfilter-devel is
already in this list so probably have to wait for their opinion.

>> On Thu 26-07-18 09:18:57, Vlastimil Babka wrote:
>> This is likely the kvmalloc() in xt_alloc_table_info(). Between 4.13 and
>> 4.17 it shouldn't use __GFP_NORETRY, but looks like commit 0537250fdc6c
>> ("netfilter: x_tables: make allocation less aggressive") was backported
>> to 4.14. Removing __GFP_NORETRY might help here, but bring back other
>> issues. Less than 4MB is not that much though, maybe find some "sane"
>> limit and use __GFP_NORETRY only above that?
>>
>>
>> Regards,
>>
>> --
>> Georgi Nikolov
>>
>>

Regards,

--
Georgi Nikolov

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html