Re: Page reference leak in fscache or cachefiles?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shantanu,

You are correct in the way that there is a correlation between those
messages and fscache hangs. The correlation is that there are a number
of bugs in fscache code when things fail (like when allocation fails).
I have 3 different patches that I'm going to submit later on for
review for these issues against 3.15.5. I'll post them late on today
when I get a free minute.

Also, sorry for not CCing the list in the first place.

On Tue, Jul 22, 2014 at 8:56 AM, Shantanu Goel <sgoel01@xxxxxxxxx> wrote:
> Hi Milosz,
>
> We thought they were harmless too initially, however we have seen quite a few hangs at this point on hosts where we enabled fscache + cachefiles and think there is some correlation with these messages.  It could be some generic VM issue which is exacerbated by the use of fscache.  We posted our initial message to get some ideas about where the problem might be since we are not certain of where the problem lies.
>
> Thanks,
> Shantanu
>
>
>
>
>> On Monday, July 21, 2014 9:31 PM, Milosz Tanski <milosz@xxxxxxxxx> wrote:
>> > Shantanu,
>>
>> This message is harmless. I see this a lot a well as we have fscache
>> on a SSD disks that do 1.1GB/s transfer and a demanding application
>> and a lot of pages waiting for write out. The kernel cannot force a
>> write out because fscache has t o allocate pages with the GPF_NOFS
>> flag (to prevent a recursive hang).
>>
>> On our production machines we changed the vm.min_free_kbytes kernel
>> tunable to 256Mb and (the machine has a lot of RAM) and this happens a
>> lot less often.
>>
>> - Milosz
>>
>> On Wed, Jul 9, 2014 at 12:35 PM, Shantanu Goel <sgoel01@xxxxxxxxx> wrote:
>>>  Hi,
>>>
>>>  We are running Linux 3.10 + fscache-20130702 + commit
>> bae6b235905ab9dc6659395f7802c1d36fb63f15 from dhowells' git tree and suspect
>> there might be a page reference leak somewhere as evidenced by the low reclaim
>> ratio.  We also see the kernel complaining about memory allocation failures in
>> the readahead code as seen below.
>>>  The machine has plenty of filecache and there isn't any heavy write
>> activity which could also pin the pages.
>>>  We wrote a script to monitor /proc/vmstat and see the following output with
>> a one second interval.
>>>
>>>
>>>     Refill      Scan      Free   FrRatio      Slab      Runs    Stalls
>>>           0    154663      6855         4      1024         2         0
>>>           0      7152      7152       100         0         4         0
>>>           0      5960      5960       100         0         4         0
>>>           0      7152      7152       100      1024         3         0
>>>           0    152407     21698        14      1024         2         0
>>>
>>>
>>>  The fields are:
>>>
>>>  Refill  - # active pages scanned, i.e. deactivated
>>>  Scan    - # inactive pages scanned
>>>  Free    - # pages freed
>>>  FrRatio - free / scan ratio as percentage
>>>  Runs    - # times kswapd ran
>>>  Stalls  - # times direct reclaim ran
>>>
>>>
>>>  The free memory looks like this (free -m):
>>>
>>>               total       used       free     shared    buffers     cached
>>>  Mem:         24175      23986        188          0         16      10453
>>>  -/+ buffers/cache:      13517      10657
>>>  Swap:         8191         85       8106
>>>
>>>
>>>  Are you aware of any such issue with fscache / cachefiles?  If not, could
>> you suggest what other information we could gather to debug it further?
>>>
>>>
>>>  Thanks,
>>>  Shantanu
>>>  SysRq : HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e)
>> memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k)
>> show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n)
>> poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s)
>> show-task-states(t) unmount(u) show-blocked-tasks(w) dump-ftrace-buffer(z)
>>>  python: page allocation failure: order:0, mode:0x11110
>>>  CPU: 5 PID: 18997 Comm: python Tainted: G        W  O
>> 3.10.36-el5.ia32e.lime.0 #1
>>>  Hardware name: Supermicro X8STi/X8STi, BIOS 2.0     06/03/2010
>>>   0000000000000000 ffff880104d39578 ffffffff81426c43 ffff880104d39608
>>>   ffffffff810fdcfc ffff88061fffd468 0000004000000000 ffff880104d395a8
>>>   ffffffff810fd506 0000000000000000 ffffffff8109120d 0001111000000000
>>>  Call Trace:
>>>   [<ffffffff81426c43>] dump_stack+0x19/0x1e
>>>   [<ffffffff810fdcfc>] warn_alloc_failed+0xfc/0x140
>>>   [<ffffffff810fd506>] ? drain_local_pages+0x16/0x20
>>>   [<ffffffff8109120d>] ? on_each_cpu_mask+0x4d/0x70
>>>   [<ffffffff810fef1c>] __alloc_pages_nodemask+0x6cc/0x910
>>>   [<ffffffff81139a9a>] alloc_pages_current+0xba/0x160
>>>   [<ffffffff810f6617>] __page_cache_alloc+0xa7/0xc0
>>>   [<ffffffffa050a4ea>] cachefiles_read_backing_file+0x2ba/0x7b0
>> [cachefiles]
>>>   [<ffffffffa050ac87>] cachefiles_read_or_alloc_pages+0x2a7/0x3d0
>> [cachefiles]
>>>   [<ffffffff81062c4f>] ? wake_up_bit+0x2f/0x40
>>>   [<ffffffffa04da24a>] ? fscache_run_op+0x5a/0xa0 [fscache]
>>>   [<ffffffffa04daacb>] ? fscache_submit_op+0x1db/0x4c0 [fscache]
>>>   [<ffffffffa04dbd65>] __fscache_read_or_alloc_pages+0x1f5/0x2c0
>> [fscache]
>>>   [<ffffffffa0531e9e>] __nfs_readpages_from_fscache+0x7e/0x1b0 [nfs]
>>>   [<ffffffffa052bc6a>] nfs_readpages+0xca/0x1f0 [nfs]
>>>   [<ffffffff81139a9a>] ? alloc_pages_current+0xba/0x160
>>>   [<ffffffff81101ae2>] __do_page_cache_readahead+0x1b2/0x260
>>>   [<ffffffff81101bb1>] ra_submit+0x21/0x30
>>>   [<ffffffff81101e85>] ondemand_readahead+0x115/0x240
>>>   [<ffffffff81102038>] page_cache_async_readahead+0x88/0xb0
>>>   [<ffffffff810f5f5e>] ? find_get_page+0x1e/0xa0
>>>   [<ffffffff810f81fc>] generic_file_aio_read+0x4dc/0x720
>>>   [<ffffffffa05224b9>] nfs_file_read+0x89/0x100 [nfs]
>>>   [<ffffffff81158a4f>] do_sync_read+0x7f/0xb0
>>>   [<ffffffff8115a3a5>] vfs_read+0xc5/0x190
>>>   [<ffffffff8115a57f>] SyS_read+0x5f/0xa0
>>>   [<ffffffff814327bb>] tracesys+0xdd/0xe2
>>>  Mem-Info:
>>>  Node 0 DMA32 per-cpu:
>>>  CPU    0: hi:  186, btch:  31 usd:   2
>>>  CPU    1: hi:  186, btch:  31 usd: 159
>>>  CPU    2: hi:  186, btch:  31 usd:  82
>>>  CPU    3: hi:  186, btch:  31 usd: 182
>>>  CPU    4: hi:  186, btch:  31 usd:  57
>>>  CPU    5: hi:  186, btch:  31 usd:   0
>>>  Node 0 Normal per-cpu:
>>>  CPU    0: hi:  186, btch:  31 usd:  22
>>>  CPU    1: hi:  186, btch:  31 usd: 117
>>>  CPU    2: hi:  186, btch:  31 usd:  16
>>>  CPU    3: hi:  186, btch:  31 usd:  79
>>>  CPU    4: hi:  186, btch:  31 usd:  59
>>>  CPU    5: hi:  186, btch:  31 usd:   0
>>>  active_anon:3461555 inactive_anon:451550 isolated_anon:0
>>>   active_file:236104 inactive_file:1899298 isolated_file:0
>>>   unevictable:467 dirty:109 writeback:0 unstable:0
>>>   free:33860 slab_reclaimable:19654 slab_unreclaimable:5110
>>>   mapped:15284 shmem:2 pagetables:10231 bounce:0
>>>   free_cma:0
>>>  Node 0 DMA32 free:91764kB min:4824kB low:6028kB high:7236kB
>> active_anon:2185528kB inactive_anon:566976kB active_file:108808kB
>> inactive_file:670996kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
>> present:3660960kB managed:3646160kB mlocked:0kB dirty:152kB writeback:0kB
>> mapped:1344kB shmem:0kB slab_reclaimable:17300kB slab_unreclaimable:580kB
>> kernel_stack:80kB pagetables:6448kB unstable:0kB bounce:0kB free_cma:0kB
>> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>  lowmem_reserve[]: 0 20607 20607
>>>  Node 0 Normal free:39164kB min:27940kB low:34924kB high:41908kB
>> active_anon:11663332kB inactive_anon:1239224kB active_file:835608kB
>> inactive_file:6926636kB unevictable:1868kB isolated(anon):0kB isolated(file):0kB
>> present:21495808kB managed:21101668kB mlocked:1868kB dirty:612kB writeback:0kB
>> mapped:60120kB shmem:8kB slab_reclaimable:61536kB slab_unreclaimable:19860kB
>> kernel_stack:2088kB pagetables:34804kB unstable:0kB bounce:0kB free_cma:0kB
>> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>  lowmem_reserve[]: 0 0 0
>>>  Node 0 DMA32: 506*4kB (UEM) 545*8kB (UEM) 569*16kB (UEM) 897*32kB (UEM)
>> 541*64kB (UEM) 87*128kB (UM) 4*256kB (UM) 2*512kB (M) 0*1024kB 0*2048kB 0*4096kB
>> = 92000kB
>>>  Node 0 Normal: 703*4kB (UE) 285*8kB (UEM) 176*16kB (UEM) 836*32kB (UEM)
>> 7*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 35108kB
>>>  Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
>> hugepages_size=2048kB
>>>  2137331 total pagecache pages
>>>  1215 pages in swap cache
>>>  Swap cache stats: add 7056879, delete 7055664, find 633567/703482
>>>  Free swap  = 8301172kB
>>>  Total swap = 8388604kB
>>>  6291455 pages RAM
>>>  102580 pages reserved
>>>  4391935 pages shared
>>>  4746779 pages non-shared
>>>
>>>  --
>>>  Linux-cachefs mailing list
>>>  Linux-cachefs@xxxxxxxxxx
>>>  https://www.redhat.com/mailman/listinfo/linux-cachefs
>>
>>
>>
>> --
>> Milosz Tanski
>> CTO
>> 16 East 34th Street, 15th floor
>> New York, NY 10016
>>
>> p: 646-253-9055
>>
>> e: milosz@xxxxxxxxx
>>



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@xxxxxxxxx

--
Linux-cachefs mailing list
Linux-cachefs@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cachefs




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]
  Powered by Linux