Re: [PATCH] Revert "NFS: readdirplus optimization by cache mechanism" (memleak)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2019-07-12 at 17:28 +0200, Greg KH wrote:
> On Fri, Jul 12, 2019 at 04:18:06PM +0200, Max Kellermann wrote:
> > This reverts commit be4c2d4723a4a637f0d1b4f7c66447141a4b3564.
> > 
> > That commit caused a severe memory leak in nfs_readdir_make_qstr().
> > 
> > When listing a directory with more than 100 files (this is how many
> > struct nfs_cache_array_entry elements fit in one 4kB page), all
> > allocated file name strings past those 100 leak.
> > 
> > The root of the leakage is that those string pointers are managed
> > in
> > pages which are never linked into the page cache.
> > 
> > fs/nfs/dir.c puts pages into the page cache by calling
> > read_cache_page(); the callback function nfs_readdir_filler() will
> > then fill the given page struct which was passed to it, which is
> > already linked in the page cache (by do_read_cache_page() calling
> > add_to_page_cache_lru()).
> > 
> > Commit be4c2d4723a4 added another (local) array of allocated pages,
> > to
> > be filled with more data, instead of discarding excess items
> > received
> > from the NFS server.  Those additional pages can be used by the
> > next
> > nfs_readdir_filler() call (from within the same nfs_readdir()
> > call).
> > 
> > The leak happens when some of those additional pages are never used
> > (copied to the page cache using copy_highpage()).  The pages will
> > be
> > freed by nfs_readdir_free_pages(), but their contents will
> > not.  The
> > commit did not invoke nfs_readdir_clear_array() (and doing so would
> > have been dangerous, because it did not track which of those pages
> > were already copied to the page cache, risking double free bugs).
> > 
> > How to reproduce the leak:
> > 
> > - Use a kernel with CONFIG_SLUB_DEBUG_ON.
> > 
> > - Create a directory on a NFS mount with more than 100 files with
> >   names long enough to use the "kmalloc-32" slab (so we can easily
> >   look up the allocation counts):
> > 
> >   for i in `seq 110`; do touch ${i}_0123456789abcdef; done
> > 
> > - Drop all caches:
> > 
> >   echo 3 >/proc/sys/vm/drop_caches
> > 
> > - Check the allocation counter:
> > 
> >   grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
> >   30564391 nfs_readdir_add_to_array+0x73/0xd0
> > age=534558/4791307/6540952 pid=370-1048386 cpus=0-47 nodes=0-1
> > 
> > - Request a directory listing and check the allocation counters
> > again:
> > 
> >   ls
> >   [...]
> >   grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
> >   30564511 nfs_readdir_add_to_array+0x73/0xd0
> > age=207/4792999/6542663 pid=370-1048386 cpus=0-47 nodes=0-1
> > 
> > There are now 120 new allocations.
> > 
> > - Drop all caches and check the counters again:
> > 
> >   echo 3 >/proc/sys/vm/drop_caches
> >   grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
> >   30564401 nfs_readdir_add_to_array+0x73/0xd0
> > age=735/4793524/6543176 pid=370-1048386 cpus=0-47 nodes=0-1
> > 
> > 110 allocations are gone, but 10 have leaked and will never be
> > freed.
> > 
> > Unhelpfully, those allocations are explicitly excluded from
> > KMEMLEAK,
> > that's why my initial attempts with KMEMLEAK were not successful:
> > 
> > 	/*
> > 	 * Avoid a kmemleak false positive. The pointer to the name is
> > stored
> > 	 * in a page cache page which kmemleak does not scan.
> > 	 */
> > 	kmemleak_not_leak(string->name);
> > 
> > It would be possible to solve this bug without reverting the whole
> > commit:
> > 
> > - keep track of which pages were not used, and call
> >   nfs_readdir_clear_array() on them, or
> > - manually link those pages into the page cache
> > 
> > But for now I have decided to just revert the commit, because the
> > real
> > fix would require complex considerations, risking more dangerous
> > (crash) bugs, which may seem unsuitable for the stable branches.
> > 
> > Signed-off-by: Max Kellermann <mk@xxxxxxxxxx>
> > ---
> >  fs/nfs/dir.c      | 90 ++++---------------------------------------
> > ----
> >  fs/nfs/internal.h |  3 +-
> >  2 files changed, 7 insertions(+), 86 deletions(-)
> 
> No cc: stable@vger on this patch to get it backported?
> 

I've added one. I've also backed out the 3 fixes for other problems
with the same patch that were in the linux-next tree. (St. Stephen
Rothwell please forgive me, for I have sinned...)

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux