On 07/30/2018 08:51 PM, Georgi Nikolov wrote: > On 07/30/2018 09:38 PM, Michal Hocko wrote: >> On Mon 30-07-18 18:54:24, Georgi Nikolov wrote: >> [...] >>> No i was wrong. The regression starts actually with 0537250fdc6c8. >>> - old code, which opencodes kvmalloc, is masking error but error is there >>> - kvmalloc without GFP_NORETRY works fine, but probably can consume a >>> lot of memory - commit: eacd86ca3b036 >>> - kvmalloc with GFP_NORETRY shows error - commit: 0537250fdc6c8 >> OK. >> >>>>> What is correct way to fix it. >>>>> - inside xt_alloc_table_info remove GFP_NORETRY from kvmalloc or add >>>>> this flag only for sizes bigger than some threshold >>>> This would reintroduce issue fixed by 0537250fdc6c8. Note that >>>> kvmalloc(GFP_KERNEL | __GFP_NORETRY) is more or less equivalent to the >>>> original code (well, except for __GFP_NOWARN). >>> So probably we should pass GFP_NORETRY only for large requests (above >>> some threshold). >> What would be the treshold? This is not really my area so I just wanted >> to keep the original code semantic. >> >>>>> - inside kvmalloc_node remove GFP_NORETRY from >>>>> __vmalloc_node_flags_caller (i don't know if it honors this flag, or >>>>> the problem is elsewhere) >>>> No, not really. This is basically equivalent to kvmalloc(GFP_KERNEL). >>>> >>>> I strongly suspect that this is not a regression in this code but rather >>>> a side effect of larger memory fragmentation caused by something else. >>>> In any case do you see this failure also without artificial test case >>>> with a standard workload? >>> Yes i can see failures with standard workload, in fact it was hard to >>> reproduce it. >>> Here is the error from production servers where allocation is smaller: >>> iptables: vmalloc: allocation failure, allocated 131072 of 225280 bytes, >>> mode:0x14010c0(GFP_KERNEL|__GFP_NORETRY), nodemask=(null) >>> >>> I didn't understand if vmalloc honors GFP_NORETRY. >> 0537250fdc6c8 changelog tries to explain. kvmalloc doesn't really >> support the GFP_NORETRY remantic because that would imply the request >> wouldn't trigger the oom killer but in rare cases this might happen >> (e.g. when page tables are allocated because those are hardcoded GFP_KERNEL). >> >> That being said, I have no objection to use GFP_KERNEL if it helps real >> workloads but we probably need some cap... > > Probably Vlastimil Babka can propose some limit: No, I think that's rather for the netfilter folks to decide. However, it seems there has been the debate already [1] and it was not found. The conclusion was that __GFP_NORETRY worked fine before, so it should work again after it's added back. But now we know that it doesn't... [1] https://lore.kernel.org/lkml/20180130140104.GE21609@xxxxxxxxxxxxxx/T/#u > On Thu 26-07-18 09:18:57, Vlastimil Babka wrote: > This is likely the kvmalloc() in xt_alloc_table_info(). Between 4.13 and > 4.17 it shouldn't use __GFP_NORETRY, but looks like commit 0537250fdc6c > ("netfilter: x_tables: make allocation less aggressive") was backported > to 4.14. Removing __GFP_NORETRY might help here, but bring back other > issues. Less than 4MB is not that much though, maybe find some "sane" > limit and use __GFP_NORETRY only above that? > > > Regards, > > -- > Georgi Nikolov > >