On Mon, 2020-11-02 at 08:50 -0500, Dave Wysochanski wrote: > A process can hang forever to 'ls -l' a directory while the directory > is being modified such as another NFS client adding files to the > directory. The problem is seen specifically with larger directories > (I tested with 1 million) and/or slower NFS server responses to > READDIR. If a combination of the NFS directory size, the NFS server > responses to READDIR is such that the 'ls' process gets partially > through the listing before the attribute cache expires (time > exceeds acdirmax), we drop the pagecache and have to re-fill it, > and as a result, the process may never complete. One could argue > for larger directories the acdirmin/acdirmax should be increased, > but it's not always possible to tune this effectively. > > The root cause of this problem is due to how the NFS readdir cache > currently works. The main search function, > readdir_search_pagecache(), > always starts searching at page_index and cookie == 0, and for any > page not in the cache, fills in the page with entries obtained in > a READDIR NFS call. If a page already exists, we proceed to > nfs_readdir_search_for_cookie(), which searches for the cookie > (pos) of the readdir call. The search is O(n), where n is the > directory size before the cookie in question is found, and every > entry to nfs_readdir() pays this penalty, irrespective of the > current directory position (dir_context.pos). The search is > expensive due to the opaque nature of readdir cookies, and the fact > that no mapping (hash) exists from cookies to pages. In the case > of a directory being modified, the above behavior can become an > excessive penalty, since the same process is forced to fill pages it > may be no longer interested in (the entries were passed in a previous > nfs_readdir call), and this can essentially lead no forward progress. > > To fix this problem, at the end of nfs_readdir(), save the page_index > corresponding to the directory position (cookie) inside the process's > nfs_open_dir_context. Then at the next entry of nfs_readdir(), use > the saved page_index as the starting search point rather than > starting > at page_index == 0. Not only does this fix the problem of listing > a directory being modified, it also significantly improves > performance > in the unmodified case since no extra search penalty is paid at each > entry to nfs_readdir(). > > In the case of lseek, since there is no hash or other mapping from a > cookie value to the page->index, just reset > nfs_open_dir_context.page_index > to 0, which will reset the search to the old behavior. > > Signed-off-by: Dave Wysochanski <dwysocha@xxxxxxxxxx> > --- > fs/nfs/dir.c | 8 +++++++- > include/linux/nfs_fs.h | 1 + > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > index 52e06c8fc7cd..b266f505b521 100644 > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -78,6 +78,7 @@ static struct nfs_open_dir_context > *alloc_nfs_open_dir_context(struct inode *dir > ctx->attr_gencount = nfsi->attr_gencount; > ctx->dir_cookie = 0; > ctx->dup_cookie = 0; > + ctx->page_index = 0; > ctx->cred = get_cred(cred); > spin_lock(&dir->i_lock); > if (list_empty(&nfsi->open_files) && > @@ -763,7 +764,7 @@ int > find_and_lock_cache_page(nfs_readdir_descriptor_t *desc) > return res; > } > > -/* Search for desc->dir_cookie from the beginning of the page cache > */ > +/* Search for desc->dir_cookie starting at desc->page_index */ > static inline > int readdir_search_pagecache(nfs_readdir_descriptor_t *desc) > { > @@ -885,6 +886,8 @@ static int nfs_readdir(struct file *file, struct > dir_context *ctx) > .ctx = ctx, > .dir_cookie = &dir_ctx->dir_cookie, > .plus = nfs_use_readdirplus(inode, ctx), > + .page_index = dir_ctx->page_index, > + .last_cookie = nfs_readdir_use_cookie(file) ? ctx- > >pos : 0, > }, > *desc = &my_desc; > int res = 0; > @@ -938,6 +941,7 @@ static int nfs_readdir(struct file *file, struct > dir_context *ctx) > out: > if (res > 0) > res = 0; > + dir_ctx->page_index = desc->page_index; > trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx->dir_cookie, > NFS_SERVER(inode)->dtsize, > my_desc.plus, res); > return res; > @@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct file *filp, > loff_t offset, int whence) > else > dir_ctx->dir_cookie = 0; > dir_ctx->duped = 0; > + /* Force readdir_search_pagecache to start over */ > + dir_ctx->page_index = 0; > } > inode_unlock(inode); > return offset; > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h > index a2c6455ea3fa..0e55c0154ccd 100644 > --- a/include/linux/nfs_fs.h > +++ b/include/linux/nfs_fs.h > @@ -93,6 +93,7 @@ struct nfs_open_dir_context { > __u64 dir_cookie; > __u64 dup_cookie; > signed char duped; > + unsigned long page_index; > }; > > /* NACK. It makes no sense to store the page index as a cursor. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx