Hi NFS list! I've been dealing with a bit of a problem over the past few days that I can't seem to get through. First, a bit of information about my environment. We're using a BlueArc Titan NAS cluster in concert with NFS v3 over TCP. The Linux hosts are all running a fairly recent update of RHES5, uname -a tells me ' 2.6.18-53.1.13.el5PAE.' My apologies for the patched-up Red Hat kernel as opposed to a stock build. Mount options are as follows: lnh-nfs01:/lthfs01 on /home/cluster1 type nfs (rw,nfsvers=3,proto=tcp,hard,intr,wsize=32768,rsize=32768,addr=10.2.25.19) The environment is pretty straightforward. It's generally just a run-of-the-mill H/A web cluster. We're utilizing bind mounts in a few situations, but the problem I'm having manifests itself on machines both with and without bind mounts. In summary, it appears that once a file has a negative dcache entry, it is never revalidated correctly without some sort of intervention. I've been able to mitigate the problem either by dropping caches more often than I'd like via /proc/sys/vm/drop_caches, or by stepping into the parent directory of the 'missing' file and running an 'ls.' It appears that the 'ls' triggers an invalidation of the parent directory (which is what we're looking for initially). To trigger the issue: 1. Stat a file that we know is non-existant. This populates the dentry cache with a negative entry. [root@lnh-util ~]# ssh root@lnh-www1a-mgmt "stat /home/cluster1/data/f/f/nfsuser01/test_nofile" stat: cannot stat `/home/cluster1/data/f/f/nfsuser01/test_nofile': No such file or directory 2. Create that file on a different server, this will also update the mtime on that parent directory, so the NFS validation code on the dentry hit ought to catch that. [root@lnh-util ~]# ssh root@lnh-sshftp1a-mgmt "touch /home/cluster1/data/f/f/nfsuser01/test_nofile" 3. Try and stat the file again. Still broken. [root@lnh-util ~]# ssh root@lnh-www1a-mgmt "stat /home/cluster1/data/f/f/nfsuser01/test_nofile" stat: cannot stat `/home/cluster1/data/f/f/nfsuser01/test_nofile': No such file or directory 4. Wait at least 60 seconds, just to rule out attribute cache data (though from reading the code, it appears that the parent directory is revalidated regardless in nfs_check_verifier). We're using defaults. 5. Read the parent directory. [root@lnh-util ~]# ssh root@lnh-www1a-mgmt "ls /home/cluster1/data/f/f/nfsuser01/ | wc -l" 16 6. And now the missing file is present. [root@lnh-util ~]# ssh root@lnh-www1a-mgmt "stat /home/cluster1/data/f/f/nfsuser01/test_nofile" File: `/home/cluster1/data/f/f/nfsuser01/test_nofile' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 15h/21d Inode: 4046108346 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2008-05-28 10:07:28.963000000 -0400 Modify: 2008-05-28 10:07:28.963000000 -0400 Change: 2008-05-28 10:07:28.963000000 -0400 [root@lnh-util ~]# The negative file entry continuously stays present. This is true even when a stat of the parent directory shows that the cached attributes have timed out and we've updated mtime data. Thoughts? Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html