On Fri, Sep 28, 2018 at 2:54 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Fri, 2018-09-28 at 14:31 -0400, Olga Kornievskaia wrote: > > On Fri, Sep 28, 2018 at 1:50 PM Trond Myklebust <trondmy@xxxxxxxxx> > > wrote: > > > > > > On Fri, 2018-09-28 at 12:54 -0400, Olga Kornievskaia wrote: > > > > Also shouldn't this be a bug fix for 4.19 and also go to stable? > > > > > > It wasn't really intended as a bugfix because I wasn't aware that > > > there > > > was a bug to fix in the first place. I assume the modification to > > > nfs4_state_find_open_context() to cause it to look for an open > > > context > > > with a READ/WRITE open mode first is what fixes the bug. Is that > > > the > > > case? > > > > I don't think so. nfs4_state_find_open_context() never gets calls > > during the run of this test case. This function is called during the > > reclaim on an open. In delegation recall the opens are not reclaimed, > > it is just reclaiming the locks. > > > > Actually, when I undo this piece of the patch, I can make it fail > > again. > > > > @@ -1027,10 +1027,7 @@ void nfs_inode_attach_open_context(struct > > nfs_open_context *ctx) > > struct nfs_inode *nfsi = NFS_I(inode); > > > > spin_lock(&inode->i_lock); > > - if (ctx->mode & FMODE_WRITE) > > - list_add(&ctx->list, &nfsi->open_files); > > - else > > - list_add_tail(&ctx->list, &nfsi->open_files); > > + list_add_tail_rcu(&ctx->list, &nfsi->open_files); > > spin_unlock(&inode->i_lock); > > } > > EXPORT_SYMBOL_GPL(nfs_inode_attach_open_context); > > > > It looks like the placement in the list matters? Any ideas why? > > Currently, the list is ordered so that writeable open contexts are > always found first. The reason is that when traversing the list during > server reboot recovery, we want to ensure that we reclaim any write > delegations on the file first so that we can cache all subsequent opens > and locks of that file. > > A delegation return requires us to recover all cached state, so it must > reclaim both OPEN and LOCK state. It is not, however, expected to > depend on the open context list ordering, since there can be no further > caching. > IOW: no, I don't see why your bug would depend on the list order unless > some part of the recovery is actually failing. Ok thanks. Need to fix the lack of OPEN reclaim. > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx > >