[Fixing linux-mm mailing list] On Fri 12-08-16 09:43:40, Michal Hocko wrote: > Hi, > > On Fri 12-08-16 09:01:41, Arkadiusz Miskiewicz wrote: > > > > Hello. > > > > I have a system with 4x2TB SATA disks, split into few partitions. Celeron G530, > > 8GB of ram, 20GB of swap. It's just basic system (so syslog, > > cron, udevd, irqbalance) + my cp tests and nothing more. kernel 4.7.0 > > > > There is software raid 5 partition on sd[abcd]4 and ext4 created with -T news > > option. > > > > Using deadline I/O scheduler. > > > > For testing I have 400GB of tiny files on it (about 6.4mln inodes) in mydir. > > I did "cp -al mydir copy{1,2,...,10}" 10x in parallel and that ended up > > with 5 of cp being killed by OOM while other 5x finished. > > > > Even two in parallel seem to be enough for OOM to kick in: > > rm -rf copy1; cp -al mydir copy1 > > rm -rf copy2; cp -al mydir copy2 > > Ouch > > > I would expect 8GB of ram to be enough for just rm/cp. Ideas? > > > > Note that I first tested the same thing with xfs (hence you can see > > " task xfsaild/md2:661 blocked for more than 120 seconds." and xfs > > related stacktraces in dmesg) and 10x cp managed to finish without > > OOM. Later I did test with ext4 which caused OOMs. I guess it is > > probably not some generic memory management problem but that's only my > > guess. > > I suspect the compaction is not able to migrate FS buffers to form > higher order pages. > > [...] > > [87259.568301] bash invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 > > This is a kernel stack allocation (so order-2 request) > > [...] > > [87259.568369] active_anon:439065 inactive_anon:146385 isolated_anon:0 > > active_file:201920 inactive_file:122369 isolated_file:0 > > This is around 3.5G of memory for file/anonymous pages which is ~43% of > RAM. Considering that the free memory is quite low this means that the > majority of the memory is consumed by somebody else. > > > unevictable:0 dirty:26675 writeback:0 unstable:0 > > slab_reclaimable:966564 slab_unreclaimable:79528 > > OK, so the slab objects eat 50% of memory. I would check /proc/slabinfo > who has eaten that memory. Large portion of the slab is reclaimable but > I suspect that it can easily prevent memory compaction to succeed. > > > mapped:2236 shmem:1 pagetables:1759 bounce:0 > > free:30651 free_pcp:0 free_cma:0 > [...] > > [87259.568395] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB > > [87259.568403] Node 0 DMA32: 11467*4kB (UME) 1525*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 58068kB > > [87259.568411] Node 0 Normal: 9927*4kB (UMEH) 1119*8kB (UMH) 19*16kB (H) 8*32kB (H) 2*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49348kB > > As you can see there are barely some high order pages available. There > are few in the atomic reserves which is a bit surprising because I would > expect they would get released under a heavy memory pressure. I will > double check that part. > > Anyway I suspect the primary reason is that the compaction cannot make > forward progress. Before 4.7 the OOM detection didn't bother to take > the compaction feedback into account and just blindly retried as long as > there was a reclaim progress. This was basically unbounded in time and > without any guarantee of a success... /proc/vmstat snapshots before you > start your load and after the OOM killer might tell us more. > > Anyway filling up memory with so many slab objects sounds suspicious on > its own. I guess that the fact you have huge number of files plays an > important role. This is something for ext4 people to answer. > > [...] > > [99888.398968] kthreadd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 > [...] > > [99888.399036] Mem-Info: > > [99888.399040] active_anon:195818 inactive_anon:195891 isolated_anon:0 > > active_file:294335 inactive_file:23747 isolated_file:0 > > LRU pages got down to 34%... > > > unevictable:0 dirty:38741 writeback:2 unstable:0 > > slab_reclaimable:1079860 slab_unreclaimable:157162 > > while slab memory increased to 59% > > [...] > > > [99888.399066] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB > > [99888.399075] Node 0 DMA32: 14370*4kB (UME) 1809*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 71952kB > > [99888.399082] Node 0 Normal: 12172*4kB (UMEH) 165*8kB (UMEH) 23*16kB (H) 9*32kB (H) 2*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 50792kB > > high order reserves still block some order-2+ blocks. > > [...] > > > [103315.505488] kthreadd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 > [...] > > [103315.505554] Mem-Info: > > [103315.505559] active_anon:154510 inactive_anon:154514 isolated_anon:0 > > active_file:317774 inactive_file:43364 isolated_file:0 > > and the LRU pages go even more down to 32% > > > unevictable:0 dirty:11801 writeback:5212 unstable:0 > > slab_reclaimable:1112194 slab_unreclaimable:166028 > > while slab grows above 60% > > [...] > > [104400.507680] Mem-Info: > > [104400.507684] active_anon:129371 inactive_anon:129450 isolated_anon:0 > > active_file:316704 inactive_file:55666 isolated_file:0 > > LRU 30% > > > unevictable:0 dirty:29991 writeback:0 unstable:0 > > slab_reclaimable:1145618 slab_unreclaimable:171545 > > slab 63% > > [...] > > > [114824.060378] Mem-Info: > > [114824.060403] active_anon:170168 inactive_anon:170168 isolated_anon:0 > > active_file:192892 inactive_file:133384 isolated_file:0 > > LRU 32% > > > unevictable:0 dirty:37109 writeback:1 unstable:0 > > slab_reclaimable:1176088 slab_unreclaimable:109598 > > slab 61% > > [...] > > That being said it is really unusual to see such a large kernel memory > foot print. The slab memory consumption grows but it doesn't seem to be > a memory leak at first glance. Anyway such a large in-kernel consumption > can severely affect forming higher order memory blocks. I believe we can > do slightly better wrt high atomic reserves but that doesn't sound like > a core problem here. I believe ext4 should look at what is going on > there as well. > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html