Thanks. I belatedly realized this is an NFS client issue, not an NFS server issue. As I mentioned in the previous e-mail, the following patch fixes the NFS client validation issues for me, but it's not clear to me why in NFS 4.0 we bail out earlier than NFS 4.1: diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 8bfaa658b2c1..6e3ece2e6984 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1631,7 +1631,7 @@ static int nfs4_lookup_revalidate(struct dentry *dentry, unsigned int flags) ret = 1; out: - return ret; + return nfs_lookup_revalidate(dentry, flags); no_open: return nfs_lookup_revalidate(dentry, flags); On Tue, Sep 25, 2018 at 1:34 PM Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > On Tue, Sep 25, 2018 at 11:56:16AM -0700, Stan Hu wrote: > > >From https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/5.6_technical_notes/known_issues-kernel, > > I see this bug has been known for a while: > > That specific server bug was fixed in several years ago, and I forget > what kernel version you were using on the server, but I'm pretty sure it > has the fix. From the network trace it looks like delegations were > being returned when they should be. > > --b. > > > > > In some cases the NFS server fails to notify NFSv4 clients about > > renames and unlinks done by other clients, or by non-NFS users of the > > server. An application on a client may then be able to open the file > > at its old pathname (and read old cached data from it, and perform > > read locks on it), long after the file no longer exists at that > > pathname on the server. > > > > To work around this issue, use NFSv3 instead of NFSv4. Alternatively, > > turn off support for leases by writing 0 to /proc/sys/fs/leases-enable > > (ideally on boot, before the nfs server is started). This change > > prevents NFSv4 delegations from being given out, restore correctness > > at the expense of some performance. > > > > On Mon, Sep 24, 2018 at 1:34 PM Stan Hu <stanhu@xxxxxxxxx> wrote: > > > > > > On Thu, Sep 20, 2018 at 11:39 AM Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > > > > > > > Last night I left my test running on for more than 30 minutes, and the > > > > > while loop still showed the stale data. I think I even turned off > > > > > attribute caching entirely to see if this would help, and it did not. > > > > > > > > Huh. Then I'm back to thinking there's a client bug in the 4.0 case. > > > > > > > > > > I've been doing more digging, and I think there is some issue with the > > > cache validation here. In NFS 4.1, it looks like in dir.c > > > nfs4_lookup_revalidate() calls nfs_lookup_revalidate() since the > > > NFS_CAP_ATOMIC_OPEN_V1 flag is active > > > (https://github.com/torvalds/linux/blob/v4.19-rc4/fs/nfs/dir.c#L1591). > > > On the other hand, since that flag isn't active for NFS 4.0, the > > > validation is much briefer > > > (https://github.com/torvalds/linux/blob/v4.19-rc4/fs/nfs/dir.c#L1599-L1628). > > > > > > I'm not sure if the comment in > > > https://github.com/torvalds/linux/blob/v4.19-rc4/fs/nfs/dir.c#L1630 > > > actually reflects what's happening. If I look at the stack trace of > > > the next file open call, I don't see additional validation: > > > > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233460] Call Trace: > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233462] dump_stack+0x8e/0xd5 > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233480] > > > nfs4_file_open+0x56/0x2a0 [nfsv4] > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233488] ? > > > nfs42_clone_file_range+0x1c0/0x1c0 [nfsv4] > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233490] do_dentry_open+0x1f6/0x360 > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233492] vfs_open+0x2f/0x40 > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233493] path_openat+0x2e8/0x1690 > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233496] ? > > > mem_cgroup_try_charge+0x8b/0x190 > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233497] do_filp_open+0x9b/0x110 > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233499] ? > > > __check_object_size+0xb8/0x1b0 > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233501] ? __alloc_fd+0x46/0x170 > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233503] do_sys_open+0x1ba/0x250 > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233505] ? do_sys_open+0x1ba/0x250 > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233507] __x64_sys_openat+0x20/0x30 > > > Sep 24 20:20:38 test-kernel kernel: [ 1145.233508] do_syscall_64+0x65/0x130 > > > > > > If I naively apply this patch: > > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > > > index 8bfaa658b2c1..6e3ece2e6984 100644 > > > --- a/fs/nfs/dir.c > > > +++ b/fs/nfs/dir.c > > > @@ -1631,7 +1631,7 @@ static int nfs4_lookup_revalidate(struct dentry > > > *dentry, unsigned int flags) > > > ret = 1; > > > > > > out: > > > - return ret; > > > + return nfs_lookup_revalidate(dentry, flags); > > > > > > no_open: > > > return nfs_lookup_revalidate(dentry, flags); > > > > > > Things behave as expected on NFS 4.0. What's the right fix here?