On 07/25/2018 09:52 PM, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Wed, 25 Jul 2018 11:42:57 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > >> https://bugzilla.kernel.org/show_bug.cgi?id=200651 >> >> Bug ID: 200651 >> Summary: cgroups iptables-restor: vmalloc: allocation failure > > Thanks. Please do note the above request. > >> Product: Memory Management >> Version: 2.5 >> Kernel Version: 4.14 >> Hardware: All >> OS: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: Other >> Assignee: akpm@xxxxxxxxxxxxxxxxxxxx >> Reporter: gnikolov@xxxxxxxxxxx >> Regression: No >> >> Created attachment 277505 >> --> https://bugzilla.kernel.org/attachment.cgi?id=277505&action=edit >> iptables save >> >> After creating large number of cgroups and under memory pressure, iptables >> command fails with following error: >> >> "iptables-restor: vmalloc: allocation failure, allocated 3047424 of 3465216 >> bytes, mode:0x14010c0(GFP_KERNEL|__GFP_NORETRY), nodemask=(null)" This is likely the kvmalloc() in xt_alloc_table_info(). Between 4.13 and 4.17 it shouldn't use __GFP_NORETRY, but looks like commit 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive") was backported to 4.14. Removing __GFP_NORETRY might help here, but bring back other issues. Less than 4MB is not that much though, maybe find some "sane" limit and use __GFP_NORETRY only above that? > I'm not sure what the problem is here, apart from iptables being > over-optimistic about vmalloc()'s abilities. > > Are cgroups having any impact on this, or is it simply vmalloc arena > fragmentation, and the iptables code should use some data structure > more sophisticated than a massive array? > > Maybe all that ccgroup metadata is contributing to the arena > fragmentation, but that allocations will be small and the two systems > should be able to live alongside, by being realistic about vmalloc. > >> System which is used to reproduce the bug is with 2 vcpus and 2GB of ram, but >> it happens on more powerfull systems. >> >> Steps to reproduce: >> >> mkdir /cgroup >> mount cgroup -t cgroup -omemory,pids,blkio,cpuacct /cgroup >> for a in `seq 1 1000`; do for b in `seq 1 4` ; do mkdir -p >> "/cgroup/user/$a/$b"; done; done >> >> Then in separate consoles >> >> cat /dev/vda > /dev/null >> ./test >> ./test >> i=0;while sleep 0 ; do iptables-restore < iptables.save ; i=$(($i+1)); echo $i; >> done >> >> Here is the source of "test" program and attached iptables.save. It happens >> also with smaller iptables.save file. >> >> #include <stdio.h> >> #include <stdlib.h> >> >> int main(void) { >> >> srand(time(NULL)); >> int i = 0, j = 0, randnum=0; >> int arr[6] = { 3072, 7168, 15360 , 31744, 64512, 130048}; >> while(1) { >> >> for (i = 0; i < 6 ; i++) { >> >> int *ptr = (int*) malloc(arr[i] * 93); >> >> for(j = 0 ; j < arr[i] * 93 / sizeof(int); j++) { >> *(ptr+j) = j+1; >> } >> >> free(ptr); >> } >> } >> } >> >