On Thu, Sep 20, 2018 at 11:39 AM Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > Last night I left my test running on for more than 30 minutes, and the > > while loop still showed the stale data. I think I even turned off > > attribute caching entirely to see if this would help, and it did not. > > Huh. Then I'm back to thinking there's a client bug in the 4.0 case. > I've been doing more digging, and I think there is some issue with the cache validation here. In NFS 4.1, it looks like in dir.c nfs4_lookup_revalidate() calls nfs_lookup_revalidate() since the NFS_CAP_ATOMIC_OPEN_V1 flag is active (https://github.com/torvalds/linux/blob/v4.19-rc4/fs/nfs/dir.c#L1591). On the other hand, since that flag isn't active for NFS 4.0, the validation is much briefer (https://github.com/torvalds/linux/blob/v4.19-rc4/fs/nfs/dir.c#L1599-L1628). I'm not sure if the comment in https://github.com/torvalds/linux/blob/v4.19-rc4/fs/nfs/dir.c#L1630 actually reflects what's happening. If I look at the stack trace of the next file open call, I don't see additional validation: Sep 24 20:20:38 test-kernel kernel: [ 1145.233460] Call Trace: Sep 24 20:20:38 test-kernel kernel: [ 1145.233462] dump_stack+0x8e/0xd5 Sep 24 20:20:38 test-kernel kernel: [ 1145.233480] nfs4_file_open+0x56/0x2a0 [nfsv4] Sep 24 20:20:38 test-kernel kernel: [ 1145.233488] ? nfs42_clone_file_range+0x1c0/0x1c0 [nfsv4] Sep 24 20:20:38 test-kernel kernel: [ 1145.233490] do_dentry_open+0x1f6/0x360 Sep 24 20:20:38 test-kernel kernel: [ 1145.233492] vfs_open+0x2f/0x40 Sep 24 20:20:38 test-kernel kernel: [ 1145.233493] path_openat+0x2e8/0x1690 Sep 24 20:20:38 test-kernel kernel: [ 1145.233496] ? mem_cgroup_try_charge+0x8b/0x190 Sep 24 20:20:38 test-kernel kernel: [ 1145.233497] do_filp_open+0x9b/0x110 Sep 24 20:20:38 test-kernel kernel: [ 1145.233499] ? __check_object_size+0xb8/0x1b0 Sep 24 20:20:38 test-kernel kernel: [ 1145.233501] ? __alloc_fd+0x46/0x170 Sep 24 20:20:38 test-kernel kernel: [ 1145.233503] do_sys_open+0x1ba/0x250 Sep 24 20:20:38 test-kernel kernel: [ 1145.233505] ? do_sys_open+0x1ba/0x250 Sep 24 20:20:38 test-kernel kernel: [ 1145.233507] __x64_sys_openat+0x20/0x30 Sep 24 20:20:38 test-kernel kernel: [ 1145.233508] do_syscall_64+0x65/0x130 If I naively apply this patch: diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 8bfaa658b2c1..6e3ece2e6984 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1631,7 +1631,7 @@ static int nfs4_lookup_revalidate(struct dentry *dentry, unsigned int flags) ret = 1; out: - return ret; + return nfs_lookup_revalidate(dentry, flags); no_open: return nfs_lookup_revalidate(dentry, flags); Things behave as expected on NFS 4.0. What's the right fix here?