On Fri, 2018-09-28 at 14:31 -0400, Olga Kornievskaia wrote: > On Fri, Sep 28, 2018 at 1:50 PM Trond Myklebust <trondmy@xxxxxxxxx> > wrote: > > > > On Fri, 2018-09-28 at 12:54 -0400, Olga Kornievskaia wrote: > > > Also shouldn't this be a bug fix for 4.19 and also go to stable? > > > > It wasn't really intended as a bugfix because I wasn't aware that > > there > > was a bug to fix in the first place. I assume the modification to > > nfs4_state_find_open_context() to cause it to look for an open > > context > > with a READ/WRITE open mode first is what fixes the bug. Is that > > the > > case? > > I don't think so. nfs4_state_find_open_context() never gets calls > during the run of this test case. This function is called during the > reclaim on an open. In delegation recall the opens are not reclaimed, > it is just reclaiming the locks. > > Actually, when I undo this piece of the patch, I can make it fail > again. > > @@ -1027,10 +1027,7 @@ void nfs_inode_attach_open_context(struct > nfs_open_context *ctx) > struct nfs_inode *nfsi = NFS_I(inode); > > spin_lock(&inode->i_lock); > - if (ctx->mode & FMODE_WRITE) > - list_add(&ctx->list, &nfsi->open_files); > - else > - list_add_tail(&ctx->list, &nfsi->open_files); > + list_add_tail_rcu(&ctx->list, &nfsi->open_files); > spin_unlock(&inode->i_lock); > } > EXPORT_SYMBOL_GPL(nfs_inode_attach_open_context); > > It looks like the placement in the list matters? Any ideas why? Currently, the list is ordered so that writeable open contexts are always found first. The reason is that when traversing the list during server reboot recovery, we want to ensure that we reclaim any write delegations on the file first so that we can cache all subsequent opens and locks of that file. A delegation return requires us to recover all cached state, so it must reclaim both OPEN and LOCK state. It is not, however, expected to depend on the open context list ordering, since there can be no further caching. IOW: no, I don't see why your bug would depend on the list order unless some part of the recovery is actually failing. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx