On Fri, Sep 28, 2018 at 3:10 PM Olga Kornievskaia <aglo@xxxxxxxxx> wrote: > > On Fri, Sep 28, 2018 at 2:54 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Fri, 2018-09-28 at 14:31 -0400, Olga Kornievskaia wrote: > > > On Fri, Sep 28, 2018 at 1:50 PM Trond Myklebust <trondmy@xxxxxxxxx> > > > wrote: > > > > > > > > On Fri, 2018-09-28 at 12:54 -0400, Olga Kornievskaia wrote: > > > > > Also shouldn't this be a bug fix for 4.19 and also go to stable? > > > > > > > > It wasn't really intended as a bugfix because I wasn't aware that > > > > there > > > > was a bug to fix in the first place. I assume the modification to > > > > nfs4_state_find_open_context() to cause it to look for an open > > > > context > > > > with a READ/WRITE open mode first is what fixes the bug. Is that > > > > the > > > > case? > > > > > > I don't think so. nfs4_state_find_open_context() never gets calls > > > during the run of this test case. This function is called during the > > > reclaim on an open. In delegation recall the opens are not reclaimed, > > > it is just reclaiming the locks. > > > > > > Actually, when I undo this piece of the patch, I can make it fail > > > again. > > > > > > @@ -1027,10 +1027,7 @@ void nfs_inode_attach_open_context(struct > > > nfs_open_context *ctx) > > > struct nfs_inode *nfsi = NFS_I(inode); > > > > > > spin_lock(&inode->i_lock); > > > - if (ctx->mode & FMODE_WRITE) > > > - list_add(&ctx->list, &nfsi->open_files); > > > - else > > > - list_add_tail(&ctx->list, &nfsi->open_files); > > > + list_add_tail_rcu(&ctx->list, &nfsi->open_files); > > > spin_unlock(&inode->i_lock); > > > } > > > EXPORT_SYMBOL_GPL(nfs_inode_attach_open_context); > > > > > > It looks like the placement in the list matters? Any ideas why? > > > > Currently, the list is ordered so that writeable open contexts are > > always found first. The reason is that when traversing the list during > > server reboot recovery, we want to ensure that we reclaim any write > > delegations on the file first so that we can cache all subsequent opens > > and locks of that file. > > > > A delegation return requires us to recover all cached state, so it must > > reclaim both OPEN and LOCK state. It is not, however, expected to > > depend on the open context list ordering, since there can be no further > > caching. > > IOW: no, I don't see why your bug would depend on the list order unless > > some part of the recovery is actually failing. > > Ok thanks. Need to fix the lack of OPEN reclaim. Wait, why are we suppose to reclaim the open state when we have a valid open stateid? We don't have any cached opens that server doesn't know about. RFC7530 says "if the file has other open reference", I think the emphasis is on "other". I don't believe we need to be sending anything besides the locks to the server. Then I'm back to square one. > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@xxxxxxxxxxxxxxx > > > >