Hi, I have a system with fscache enabled on an NFS mount and it is leaking pages and following are the symptoms. The system is running ubuntu xenial kernel with # uname -a Linux XX-XXXX-XXX 4.4.0-124-generic #148-Ubuntu SMP Wed May 2 13:00:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux # uptime 16:28:46 up 42 days, 20:20, 13 users, load average: 3.85, 4.58, 4.73 page allocations in cachefiles_read_backing_file is failing with a huge number of OOMs and finally the following failure. ~$ dmesg [2434369.019106] [<ffffffff810c69a0>] ? __wake_up_bit+0x50/0x70 [2434369.019109] [<ffffffff810c6a85>] ? wake_up_bit+0x25/0x30 [2434369.019118] [<ffffffffc054d27c>] ? fscache_run_op.isra.7+0x4c/0x80 [fscache] [2434369.019123] [<ffffffffc054f4cf>] __fscache_read_or_alloc_pages+0x1af/0x2e0 [fscache] [2434369.019142] [<ffffffffc0826710>] __nfs_readpages_from_fscache+0x130/0x1a0 [nfs] [2434369.019152] [<ffffffffc081d111>] nfs_readpages+0xc1/0x200 [nfs] [2434369.019155] [<ffffffff811e699c>] ? alloc_pages_current+0x8c/0x110 [2434369.019159] [<ffffffff811a1a39>] __do_page_cache_readahead+0x199/0x240 [2434369.019162] [<ffffffff810c6678>] ? __wake_up_common+0x58/0x90 [2434369.019165] [<ffffffff811a1c1d>] ondemand_readahead+0x13d/0x250 [2434369.019168] [<ffffffff811a1d9b>] page_cache_async_readahead+0x6b/0x70 [2434369.019171] [<ffffffff81194524>] generic_file_read_iter+0x464/0x690 [2434369.019180] [<ffffffffc08144a8>] ? __nfs_revalidate_mapping+0xc8/0x290 [nfs] [2434369.019188] [<ffffffffc080fdf2>] nfs_file_read+0x52/0xa0 [nfs] [2434369.019192] [<ffffffff8121353e>] new_sync_read+0x9e/0xe0 [2434369.019195] [<ffffffff812135a9>] __vfs_read+0x29/0x40 [2434369.019197] [<ffffffff81213b76>] vfs_read+0x86/0x130 [2434369.019200] [<ffffffff81214a85>] SyS_pread64+0x95/0xb0 [2434369.019206] [<ffffffff8184f788>] entry_SYSCALL_64_fastpath+0x1c/0xbb [2434369.019223] Mem-Info: [2434369.019235] active_anon:9626413 inactive_anon:761627 isolated_anon:0 active_file:905105 inactive_file:116090946 isolated_file:424 >>> says around 116 MB was locked unevictable:0 dirty:0 writeback:0 unstable:2 slab_reclaimable:2701894 slab_unreclaimable:185099 mapped:592946 shmem:1416519 pagetables:88826 bounce:0 free:283509 free_pcp:0 free_cma:0 == results after leaving for few more days , so the leak has increased to 222 GB ==== Looking at the stats for retrievals and oom confirms that the system ran into memory pressure, # echo 3 > /proc/sys/vm/drop_caches ; echo 3 > /proc/sys/vm/drop_caches ; ./test//vm/page-types -r -b lru,~unevictable,~active,~locked,~referenced flags page-count MB symbolic-flags long-symbolic-flags 0x0000000800000020 652 2 _____l_______________________P____________ lru,private 0x0000000400000028 15341560 59927 ___U_l______________________d_____________ uptodate,lru,mappedtodisk 0x0000000000000028 42646984 166589 ___U_l____________________________________ uptodate,lru >>> leak in the pages 0x0001000400000028 291 1 ___U_l______________________d______I______ uptodate,lru,mappedtodisk,readahead 0x0001000000000028 2 0 ___U_l_____________________________I______ uptodate,lru,readahead 0x0000001000000028 80 0 ___U_l________________________p___________ uptodate,lru,private_2 0x0001001000000028 1 0 ___U_l________________________p____I______ uptodate,lru,private_2,readahead 0x0000000c00000028 152 0 ___U_l______________________dP____________ uptodate,lru,mappedtodisk,private 0x0000000800000028 1 0 ___U_l_______________________P____________ uptodate,lru,private 0x0001000000004030 183390 716 ____Dl________b____________________I______ dirty,lru,swapbacked,readahead 0x0001000000004038 927079 3621 ___UDl________b____________________I______ uptodate,dirty,lru,swapbacked,readahead 0x0000000000004038 2 0 ___UDl________b___________________________ uptodate,dirty,lru,swapbacked 0x0000000c00000038 6 0 ___UDl______________________dP____________ uptodate,dirty,lru,mappedtodisk,private 0x0000000400000828 2981 11 ___U_l_____M________________d_____________ uptodate,lru,mmap,mappedtodisk 0x0000000000004828 1 0 ___U_l_____M__b___________________________ uptodate,lru,mmap,swapbacked 0x0000000000004838 8140 31 ___UDl_____M__b___________________________ uptodate,dirty,lru,mmap,swapbacked total 59111322 230903 # cat /proc/fs/fscache/stats FS-Cache statistics Cookies: idx=286 dat=94575250 spc=0 Objects: alc=93658872 nal=37 avl=93658871 ded=93573062 ChkAux : non=0 ok=80656494 upd=0 obs=145 Pages : mrk=238684375647 unc=238677703085 Acquire: n=94575536 nul=0 noc=0 ok=94575536 nbf=0 oom=0 Lookups: n=93658872 neg=13002215 pos=80656657 crt=13002215 tmo=0 Invals : n=23210 run=23210 Updates: n=0 nul=0 run=23210 Relinqs: n=94489599 nul=0 wcr=0 rtr=0 AttrChg: n=0 ok=0 nbf=0 oom=0 run=0 Allocs : n=0 ok=0 wt=0 nbf=0 int=0 Allocs : ops=0 owt=0 abt=0 Retrvls: n=1055039450 ok=1017317776 wt=1966427 nod=15094761 nbf=2742833 int=0 oom=19884080 >>>> huge number of OOMS Retrvls: ops=1052296617 owt=1250626 abt=0 Stores : n=1808983598 ok=1808983598 agn=0 nbf=0 oom=0 Stores : ops=23287547 run=1832257814 pgs=1808970267 rxd=1808983598 olm=0 ipp=0 VmScan : nos=2045450030 gon=9 bsy=22 can=13331 wt=2013 Ops : pend=1250891 run=1075607374 enq=729855483 can=0 rej=0 Ops : ini=2861303425 dfr=418493 rel=2861303422 gc=418493 CacheOp: alo=0 luo=0 luc=0 gro=0 CacheOp: inv=0 upo=0 dro=0 pto=0 atc=0 syn=0 CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0 CacheEv: nsp=145 stl=0 rtr=0 cul=0 Looking at the page flags using page-types The bug seems to be in ext4 pages being used for fscache. I have root caused the bug to be a leak in ext4 cache page when trying to read a page from fscache and the page is still in the NFS page lru list. I have simulated the situation by not freeing the backpages and we can see the pattern matches with page flags searching old archives found that there is a known fix, but was never committed to upstream. https://www.redhat.com/archives/linux-cachefs/2014-August/msg00017.html This patch never made into mainline kernels, but looks like an obvious fix for the problem, diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index ad74a6a..ead4981 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -523,7 +523,10 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, netpage->index, cachefiles_gfp); if (ret < 0) { if (ret == -EEXIST) { + page_cache_release(backpage); + backpage = NULL; page_cache_release(netpage); + netpage = NULL; fscache_retrieval_complete(op, 1); continue; } @@ -596,7 +599,10 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, netpage->index, cachefiles_gfp); if (ret < 0) { if (ret == -EEXIST) { + page_cache_release(backpage); + backpage = NULL; page_cache_release(netpage); + netpage = NULL; fscache_retrieval_complete(op, 1); continue; } David, If applicable, can you please help pull this change into upstream release. Thanks Kiran -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs