> On Dec 7, 2016, at 17:34, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > > From: "Benjamin Coddington" <bcodding@xxxxxxxxxx> > > So, I've got the following patch, and it seems to work fine for the original > workload. However, if I use ~20 procs started 2 seconds apart, I can still > sometimes get into the state where the machine takes a very long time (5 - 8 > minutes). I wonder if I am getting into a state were vmscan is reclaiming > pages while I'm trying to fill them up. So.. I'll do a bit more debugging > and re-post this if I feel like it is still the right approach. > > Adding an int to nfs_open_dir_context puts it at 60 bytes here. Probably > could have added a unsigned long flags and used bits for the duped stuff as > well, but it was uglier that way, so I left it. I like how duped carries > -1, 0, and >1 information without having flag defines cluttering everywhere. > > Ben > > 8<------------------------------------------------------------------------ > > From 5b3e747a984ec966e8c742d8f4fe4b08e1c93acd Mon Sep 17 00:00:00 2001 > Message-Id: <5b3e747a984ec966e8c742d8f4fe4b08e1c93acd.1481149380.git.bcodding@xxxxxxxxxx> > From: Benjamin Coddington <bcodding@xxxxxxxxxx> > Date: Wed, 7 Dec 2016 16:23:30 -0500 > Subject: [PATCH] NFS: Move readdirplus flag to directory context > > For a concurrent 'ls -l' workload, processes can stack up in nfs_readdir() > both waiting on the next page and taking turns filling the next page from > XDR, but only one of them will have desc->plus set because setting it > clears the flag on the directory. If a page is filled by a process that > doesn't have desc->plus set then the next pass through lookup() will cause > it to dump the entire page cache with nfs_force_use_readdirplus(). Then > the next readdir starts all over filling the pagecache. Forward progress > happens, but only after many steps back re-filling the pagecache. > > Fix this by moving the flag to use readdirplus to each open directory > context. > > Suggested-by: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> > Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx> > --- > fs/nfs/dir.c | 39 ++++++++++++++++++++++++--------------- > include/linux/nfs_fs.h | 1 + > 2 files changed, 25 insertions(+), 15 deletions(-) > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > index 7483722162fa..818172606fed 100644 > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -78,6 +78,7 @@ static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir > ctx->dir_cookie = 0; > ctx->dup_cookie = 0; > ctx->cred = get_rpccred(cred); > + ctx->use_readdir_plus = 0; > spin_lock(&dir->i_lock); > list_add(&ctx->list, &nfsi->open_files); > spin_unlock(&dir->i_lock); > @@ -443,17 +444,35 @@ int nfs_same_file(struct dentry *dentry, struct nfs_entry *entry) > } > > static > -bool nfs_use_readdirplus(struct inode *dir, struct dir_context *ctx) > +bool nfs_use_readdirplus(struct inode *dir, struct dir_context *ctx, > + struct nfs_open_dir_context *dir_ctx) > { > if (!nfs_server_capable(dir, NFS_CAP_READDIRPLUS)) > return false; > - if (test_and_clear_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags)) > + if (xchg(&dir_ctx->use_readdir_plus, 0)) > return true; > if (ctx->pos == 0) > return true; > return false; > } > > +static > +void nfs_set_readdirplus(struct inode *dir, int force) > +{ > + struct nfs_inode *nfsi = NFS_I(dir); > + struct nfs_open_dir_context *ctx; > + > + if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) && > + !list_empty(&nfsi->open_files)) { > + spin_lock(&dir->i_lock); > + list_for_each_entry(ctx, &nfsi->open_files, list) > + ctx->use_readdir_plus = 1; > + spin_unlock(&dir->i_lock); > + if (force) > + invalidate_mapping_pages(dir->i_mapping, 0, -1); > + } > +} > + > /* > * This function is called by the lookup and getattr code to request the > * use of readdirplus to accelerate any future lookups in the same > @@ -461,11 +480,7 @@ bool nfs_use_readdirplus(struct inode *dir, struct dir_context *ctx) > */ > void nfs_advise_use_readdirplus(struct inode *dir) > { > - struct nfs_inode *nfsi = NFS_I(dir); > - > - if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) && > - !list_empty(&nfsi->open_files)) > - set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); > + nfs_set_readdirplus(dir, 0); > } > > /* > @@ -478,13 +493,7 @@ void nfs_advise_use_readdirplus(struct inode *dir) > */ > void nfs_force_use_readdirplus(struct inode *dir) > { > - struct nfs_inode *nfsi = NFS_I(dir); > - > - if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) && > - !list_empty(&nfsi->open_files)) { > - set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); > - invalidate_mapping_pages(dir->i_mapping, 0, -1); > - } > + nfs_set_readdirplus(dir, 1); > } > > static > @@ -921,7 +930,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx) > desc->ctx = ctx; > desc->dir_cookie = &dir_ctx->dir_cookie; > desc->decode = NFS_PROTO(inode)->decode_dirent; > - desc->plus = nfs_use_readdirplus(inode, ctx) ? 1 : 0; > + desc->plus = nfs_use_readdirplus(inode, ctx, dir_ctx) ? 1 : 0; This fixes desc->plus at the beginning of the readdir() call. Perhaps we should instead check the value of ctx->use_readdir_plus in the call to nfs_readdir_xdr_filler(), and just resetting cts->use_readdir_plus at the very end of nfs_readdir()? > > if (ctx->pos == 0 || nfs_attribute_cache_expired(inode)) > res = nfs_revalidate_mapping(inode, file->f_mapping); > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h > index cb631973839a..fe06c1dd1737 100644 > --- a/include/linux/nfs_fs.h > +++ b/include/linux/nfs_fs.h > @@ -89,6 +89,7 @@ struct nfs_open_dir_context { > __u64 dir_cookie; > __u64 dup_cookie; > signed char duped; > + int use_readdir_plus; > }; > > /* > -- > 2.5.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html