Shantanu, You are correct in the way that there is a correlation between those messages and fscache hangs. The correlation is that there are a number of bugs in fscache code when things fail (like when allocation fails). I have 3 different patches that I'm going to submit later on for review for these issues against 3.15.5. I'll post them late on today when I get a free minute. Also, sorry for not CCing the list in the first place. On Tue, Jul 22, 2014 at 8:56 AM, Shantanu Goel <sgoel01@xxxxxxxxx> wrote: > Hi Milosz, > > We thought they were harmless too initially, however we have seen quite a few hangs at this point on hosts where we enabled fscache + cachefiles and think there is some correlation with these messages. It could be some generic VM issue which is exacerbated by the use of fscache. We posted our initial message to get some ideas about where the problem might be since we are not certain of where the problem lies. > > Thanks, > Shantanu > > > > >> On Monday, July 21, 2014 9:31 PM, Milosz Tanski <milosz@xxxxxxxxx> wrote: >> > Shantanu, >> >> This message is harmless. I see this a lot a well as we have fscache >> on a SSD disks that do 1.1GB/s transfer and a demanding application >> and a lot of pages waiting for write out. The kernel cannot force a >> write out because fscache has t o allocate pages with the GPF_NOFS >> flag (to prevent a recursive hang). >> >> On our production machines we changed the vm.min_free_kbytes kernel >> tunable to 256Mb and (the machine has a lot of RAM) and this happens a >> lot less often. >> >> - Milosz >> >> On Wed, Jul 9, 2014 at 12:35 PM, Shantanu Goel <sgoel01@xxxxxxxxx> wrote: >>> Hi, >>> >>> We are running Linux 3.10 + fscache-20130702 + commit >> bae6b235905ab9dc6659395f7802c1d36fb63f15 from dhowells' git tree and suspect >> there might be a page reference leak somewhere as evidenced by the low reclaim >> ratio. We also see the kernel complaining about memory allocation failures in >> the readahead code as seen below. >>> The machine has plenty of filecache and there isn't any heavy write >> activity which could also pin the pages. >>> We wrote a script to monitor /proc/vmstat and see the following output with >> a one second interval. >>> >>> >>> Refill Scan Free FrRatio Slab Runs Stalls >>> 0 154663 6855 4 1024 2 0 >>> 0 7152 7152 100 0 4 0 >>> 0 5960 5960 100 0 4 0 >>> 0 7152 7152 100 1024 3 0 >>> 0 152407 21698 14 1024 2 0 >>> >>> >>> The fields are: >>> >>> Refill - # active pages scanned, i.e. deactivated >>> Scan - # inactive pages scanned >>> Free - # pages freed >>> FrRatio - free / scan ratio as percentage >>> Runs - # times kswapd ran >>> Stalls - # times direct reclaim ran >>> >>> >>> The free memory looks like this (free -m): >>> >>> total used free shared buffers cached >>> Mem: 24175 23986 188 0 16 10453 >>> -/+ buffers/cache: 13517 10657 >>> Swap: 8191 85 8106 >>> >>> >>> Are you aware of any such issue with fscache / cachefiles? If not, could >> you suggest what other information we could gather to debug it further? >>> >>> >>> Thanks, >>> Shantanu >>> SysRq : HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) >> memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) >> show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) >> poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) >> show-task-states(t) unmount(u) show-blocked-tasks(w) dump-ftrace-buffer(z) >>> python: page allocation failure: order:0, mode:0x11110 >>> CPU: 5 PID: 18997 Comm: python Tainted: G W O >> 3.10.36-el5.ia32e.lime.0 #1 >>> Hardware name: Supermicro X8STi/X8STi, BIOS 2.0 06/03/2010 >>> 0000000000000000 ffff880104d39578 ffffffff81426c43 ffff880104d39608 >>> ffffffff810fdcfc ffff88061fffd468 0000004000000000 ffff880104d395a8 >>> ffffffff810fd506 0000000000000000 ffffffff8109120d 0001111000000000 >>> Call Trace: >>> [<ffffffff81426c43>] dump_stack+0x19/0x1e >>> [<ffffffff810fdcfc>] warn_alloc_failed+0xfc/0x140 >>> [<ffffffff810fd506>] ? drain_local_pages+0x16/0x20 >>> [<ffffffff8109120d>] ? on_each_cpu_mask+0x4d/0x70 >>> [<ffffffff810fef1c>] __alloc_pages_nodemask+0x6cc/0x910 >>> [<ffffffff81139a9a>] alloc_pages_current+0xba/0x160 >>> [<ffffffff810f6617>] __page_cache_alloc+0xa7/0xc0 >>> [<ffffffffa050a4ea>] cachefiles_read_backing_file+0x2ba/0x7b0 >> [cachefiles] >>> [<ffffffffa050ac87>] cachefiles_read_or_alloc_pages+0x2a7/0x3d0 >> [cachefiles] >>> [<ffffffff81062c4f>] ? wake_up_bit+0x2f/0x40 >>> [<ffffffffa04da24a>] ? fscache_run_op+0x5a/0xa0 [fscache] >>> [<ffffffffa04daacb>] ? fscache_submit_op+0x1db/0x4c0 [fscache] >>> [<ffffffffa04dbd65>] __fscache_read_or_alloc_pages+0x1f5/0x2c0 >> [fscache] >>> [<ffffffffa0531e9e>] __nfs_readpages_from_fscache+0x7e/0x1b0 [nfs] >>> [<ffffffffa052bc6a>] nfs_readpages+0xca/0x1f0 [nfs] >>> [<ffffffff81139a9a>] ? alloc_pages_current+0xba/0x160 >>> [<ffffffff81101ae2>] __do_page_cache_readahead+0x1b2/0x260 >>> [<ffffffff81101bb1>] ra_submit+0x21/0x30 >>> [<ffffffff81101e85>] ondemand_readahead+0x115/0x240 >>> [<ffffffff81102038>] page_cache_async_readahead+0x88/0xb0 >>> [<ffffffff810f5f5e>] ? find_get_page+0x1e/0xa0 >>> [<ffffffff810f81fc>] generic_file_aio_read+0x4dc/0x720 >>> [<ffffffffa05224b9>] nfs_file_read+0x89/0x100 [nfs] >>> [<ffffffff81158a4f>] do_sync_read+0x7f/0xb0 >>> [<ffffffff8115a3a5>] vfs_read+0xc5/0x190 >>> [<ffffffff8115a57f>] SyS_read+0x5f/0xa0 >>> [<ffffffff814327bb>] tracesys+0xdd/0xe2 >>> Mem-Info: >>> Node 0 DMA32 per-cpu: >>> CPU 0: hi: 186, btch: 31 usd: 2 >>> CPU 1: hi: 186, btch: 31 usd: 159 >>> CPU 2: hi: 186, btch: 31 usd: 82 >>> CPU 3: hi: 186, btch: 31 usd: 182 >>> CPU 4: hi: 186, btch: 31 usd: 57 >>> CPU 5: hi: 186, btch: 31 usd: 0 >>> Node 0 Normal per-cpu: >>> CPU 0: hi: 186, btch: 31 usd: 22 >>> CPU 1: hi: 186, btch: 31 usd: 117 >>> CPU 2: hi: 186, btch: 31 usd: 16 >>> CPU 3: hi: 186, btch: 31 usd: 79 >>> CPU 4: hi: 186, btch: 31 usd: 59 >>> CPU 5: hi: 186, btch: 31 usd: 0 >>> active_anon:3461555 inactive_anon:451550 isolated_anon:0 >>> active_file:236104 inactive_file:1899298 isolated_file:0 >>> unevictable:467 dirty:109 writeback:0 unstable:0 >>> free:33860 slab_reclaimable:19654 slab_unreclaimable:5110 >>> mapped:15284 shmem:2 pagetables:10231 bounce:0 >>> free_cma:0 >>> Node 0 DMA32 free:91764kB min:4824kB low:6028kB high:7236kB >> active_anon:2185528kB inactive_anon:566976kB active_file:108808kB >> inactive_file:670996kB unevictable:0kB isolated(anon):0kB isolated(file):0kB >> present:3660960kB managed:3646160kB mlocked:0kB dirty:152kB writeback:0kB >> mapped:1344kB shmem:0kB slab_reclaimable:17300kB slab_unreclaimable:580kB >> kernel_stack:80kB pagetables:6448kB unstable:0kB bounce:0kB free_cma:0kB >> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>> lowmem_reserve[]: 0 20607 20607 >>> Node 0 Normal free:39164kB min:27940kB low:34924kB high:41908kB >> active_anon:11663332kB inactive_anon:1239224kB active_file:835608kB >> inactive_file:6926636kB unevictable:1868kB isolated(anon):0kB isolated(file):0kB >> present:21495808kB managed:21101668kB mlocked:1868kB dirty:612kB writeback:0kB >> mapped:60120kB shmem:8kB slab_reclaimable:61536kB slab_unreclaimable:19860kB >> kernel_stack:2088kB pagetables:34804kB unstable:0kB bounce:0kB free_cma:0kB >> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>> lowmem_reserve[]: 0 0 0 >>> Node 0 DMA32: 506*4kB (UEM) 545*8kB (UEM) 569*16kB (UEM) 897*32kB (UEM) >> 541*64kB (UEM) 87*128kB (UM) 4*256kB (UM) 2*512kB (M) 0*1024kB 0*2048kB 0*4096kB >> = 92000kB >>> Node 0 Normal: 703*4kB (UE) 285*8kB (UEM) 176*16kB (UEM) 836*32kB (UEM) >> 7*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 35108kB >>> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 >> hugepages_size=2048kB >>> 2137331 total pagecache pages >>> 1215 pages in swap cache >>> Swap cache stats: add 7056879, delete 7055664, find 633567/703482 >>> Free swap = 8301172kB >>> Total swap = 8388604kB >>> 6291455 pages RAM >>> 102580 pages reserved >>> 4391935 pages shared >>> 4746779 pages non-shared >>> >>> -- >>> Linux-cachefs mailing list >>> Linux-cachefs@xxxxxxxxxx >>> https://www.redhat.com/mailman/listinfo/linux-cachefs >> >> >> >> -- >> Milosz Tanski >> CTO >> 16 East 34th Street, 15th floor >> New York, NY 10016 >> >> p: 646-253-9055 >> >> e: milosz@xxxxxxxxx >> -- Milosz Tanski CTO 16 East 34th Street, 15th floor New York, NY 10016 p: 646-253-9055 e: milosz@xxxxxxxxx -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs