On Sun, Nov 16, 2014 at 8:26 PM, <mro2@xxxxxxx> wrote: > > When I manually drop caches, it looks like this: > > # grep -i dirty /proc/meminfo ; free; sync ; sync ; sync ; echo 3 > > /proc/sys/vm/drop_caches ; free ; grep -i dirty /proc/meminfo > Dirty: 2224 kB > total used free shared buffers cached > Mem: 3926016 3690288 235728 0 85628 1376996 > -/+ buffers/cache: 2227664 1698352 > Swap: 5244924 323872 4921052 > total used free shared buffers cached > Mem: 3926016 2604568 1321448 0 132 407696 > -/+ buffers/cache: 2196740 1729276 > Swap: 5244924 323580 4921344 > Dirty: 8 kB > > > However, during backup times (usually when the OOM happens) the page cache > is not (automatically) emptied before OOMing. > > > Right now I tried finding out using fincore what exactly is in the page > cache, only to get allocation failures: You may try tool from kernel sources: tools/vm/page-types.c since 3.15 (or so) it can dump file page-cache (key -f) recursively for a tree. > > > Nov 16 18:04:31 fs kernel: [554283.989323] fincore: page allocation failure: order:4, mode:0xd0 > Nov 16 18:04:31 fs kernel: [554283.989331] Pid: 17921, comm: fincore Tainted: G E X 3.0.101-0.35-default #1 > Nov 16 18:04:31 fs kernel: [554283.989333] Call Trace: > Nov 16 18:04:31 fs kernel: [554283.989345] [<ffffffff81004935>] dump_trace+0x75/0x310 > Nov 16 18:04:31 fs kernel: [554283.989352] [<ffffffff8145f2f3>] dump_stack+0x69/0x6f > Nov 16 18:04:31 fs kernel: [554283.989357] [<ffffffff81100a46>] warn_alloc_failed+0xc6/0x170 > Nov 16 18:04:31 fs kernel: [554283.989361] [<ffffffff81102631>] __alloc_pages_slowpath+0x541/0x7d0 > Nov 16 18:04:31 fs kernel: [554283.989364] [<ffffffff81102aa9>] __alloc_pages_nodemask+0x1e9/0x200 > Nov 16 18:04:31 fs kernel: [554283.989368] [<ffffffff811439c3>] kmem_getpages+0x53/0x180 > Nov 16 18:04:31 fs kernel: [554283.989372] [<ffffffff811447c6>] fallback_alloc+0x196/0x270 > Nov 16 18:04:31 fs kernel: [554283.989375] [<ffffffff81145117>] kmem_cache_alloc_trace+0x207/0x2a0 > Nov 16 18:04:31 fs kernel: [554283.989380] [<ffffffff810dc466>] __tracing_open+0x66/0x330 > Nov 16 18:04:31 fs kernel: [554283.989384] [<ffffffff810dc783>] tracing_open+0x53/0xb0 > Nov 16 18:04:31 fs kernel: [554283.989388] [<ffffffff81158f68>] __dentry_open+0x198/0x310 > Nov 16 18:04:31 fs kernel: [554283.989393] [<ffffffff81168572>] do_last+0x1f2/0x800 > Nov 16 18:04:31 fs kernel: [554283.989397] [<ffffffff811697e9>] path_openat+0xd9/0x420 > Nov 16 18:04:31 fs kernel: [554283.989400] [<ffffffff81169c6c>] do_filp_open+0x4c/0xc0 > Nov 16 18:04:31 fs kernel: [554283.989403] [<ffffffff8115a90f>] do_sys_open+0x17f/0x250 > Nov 16 18:04:31 fs kernel: [554283.989409] [<ffffffff8146a012>] system_call_fastpath+0x16/0x1b > Nov 16 18:04:31 fs kernel: [554283.989453] [<00007ff2d2b05fd0>] 0x7ff2d2b05fcf > Nov 16 18:04:31 fs kernel: [554283.989454] Mem-Info: > Nov 16 18:04:31 fs kernel: [554283.989455] Node 0 DMA per-cpu: > Nov 16 18:04:31 fs kernel: [554283.989457] CPU 0: hi: 0, btch: 1 usd: 0 > Nov 16 18:04:31 fs kernel: [554283.989459] CPU 1: hi: 0, btch: 1 usd: 0 > Nov 16 18:04:31 fs kernel: [554283.989460] Node 0 DMA32 per-cpu: > Nov 16 18:04:31 fs kernel: [554283.989461] CPU 0: hi: 186, btch: 31 usd: 0 > Nov 16 18:04:31 fs kernel: [554283.989463] CPU 1: hi: 186, btch: 31 usd: 185 > Nov 16 18:04:31 fs kernel: [554283.989464] Node 0 Normal per-cpu: > Nov 16 18:04:31 fs kernel: [554283.989465] CPU 0: hi: 186, btch: 31 usd: 0 > Nov 16 18:04:31 fs kernel: [554283.989466] CPU 1: hi: 186, btch: 31 usd: 57 > Nov 16 18:04:31 fs kernel: [554283.989469] active_anon:50701 inactive_anon:27212 isolated_anon:0 > Nov 16 18:04:31 fs kernel: [554283.989470] active_file:82852 inactive_file:268841 isolated_file:0 > Nov 16 18:04:31 fs kernel: [554283.989471] unevictable:7301 dirty:63 writeback:0 unstable:0 > Nov 16 18:04:31 fs kernel: [554283.989471] free:42190 slab_reclaimable:84053 slab_unreclaimable:239021 > Nov 16 18:04:31 fs kernel: [554283.989472] mapped:7031 shmem:29 pagetables:2934 bounce:0 > Nov 16 18:04:31 fs kernel: [554283.989474] Node 0 DMA free:15880kB min:256kB low:320kB high:384kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolate d(file):0kB present:15688kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes > Nov 16 18:04:31 fs kernel: [554283.989480] lowmem_reserve[]: 0 3000 4010 4010 > Nov 16 18:04:31 fs kernel: [554283.989482] Node 0 DMA32 free:131040kB min:50368kB low:62960kB high:75552kB active_anon:172872kB inactive_anon:57708kB active_file:301912kB inactive_file:900636kB unevictable:22 688kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:22688kB dirty:12kB writeback:0kB mapped:19944kB shmem:56kB slab_reclaimable:259512kB slab_unreclaimable:763412kB kernel_stack:1280kB pagetables: 3560kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > Nov 16 18:04:31 fs kernel: [554283.989489] lowmem_reserve[]: 0 0 1010 1010 > Nov 16 18:04:31 fs kernel: [554283.989492] Node 0 Normal free:21840kB > min:16956kB low:21192kB high:25432kB active_anon:29932kB > inactive_anon:51140kB active_file:29496kB inactive_file:174728kB > unevictable:6516 > kB isolated(anon):0kB isolated(file):0kB present:1034240kB mlocked:6516kB > dirty:240kB writeback:0kB mapped:8180kB shmem:60kB slab_reclaimable:76700kB > slab_unreclaimable:192672kB kernel_stack:3072kB pagetables:8176k > B unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:21 > all_unreclaimable? no > Nov 16 18:04:31 fs kernel: [554283.989499] lowmem_reserve[]: 0 0 0 0 > Nov 16 18:04:31 fs kernel: [554283.989501] Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15880kB > Nov 16 18:04:31 fs kernel: [554283.989506] Node 0 DMA32: 1896*4kB 14924*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 131040kB > Nov 16 18:04:31 fs kernel: [554283.989512] Node 0 Normal: 4826*4kB 61*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 21840kB > Nov 16 18:04:31 fs kernel: [554283.989518] 197543 total pagecache pages > Nov 16 18:04:31 fs kernel: [554283.989519] 8325 pages in swap cache > Nov 16 18:04:31 fs kernel: [554283.989520] Swap cache stats: add 227787, delete 219462, find 1899386/1918583 > Nov 16 18:04:31 fs kernel: [554283.989521] Free swap = 4921412kB > Nov 16 18:04:31 fs kernel: [554283.989522] Total swap = 5244924kB > Nov 16 18:04:31 fs kernel: [554283.989523] 1030522 pages RAM > > > Note that tmpfs is empty. > > > BTW this is a Novell fileserver with their NSS filesystem. They just say "give it more RAM" (duh) Heh, you have out of tree filesystem kernel module? This's much more suspicious than cifs. > > >> Swap can be used only for anon pages or for tmpfs. You have a lot of >> file page cache. >> I guess this is leak of pages' reference counter in some filesystem, >> more likely in cifs. >> >> Try to isolate which part of workload causes this leak, for example >> switch to another filesystem. >> >> On Sun, Nov 16, 2014 at 5:11 PM, Marki <mro2@xxxxxxx> wrote: >> > >> > Hey there, >> > >> > I wouldn't know where to turn anymore, maybe you guys can help me debug this >> > OOM. >> > >> > Questions aside from "why in the end is this happening": >> > - GFP mask lower byte 0xa indicates a request for a free page in highmem. >> > This is a 64-bit system and therefore has no highmem zone. So what's going on? >> > - Swap is almost not used: why not use it before OOMing? >> > - Pagecache is high: why not empty it before OOMing? (almost no dirty pages) >> > >> > Oh and it's a machine with 4G of RAM on kernel 3.0.101 (SLES11 SP3). > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href