Negative dentry hits not revalidating correctly?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi NFS list!

I've been dealing with a bit of a problem over the past few days that
I can't seem to get through. First, a bit of information about my
environment.  We're using a BlueArc Titan NAS cluster in concert with
NFS v3 over TCP. The Linux hosts are all running a fairly recent
update of RHES5, uname -a tells me ' 2.6.18-53.1.13.el5PAE.'   My
apologies for the patched-up Red Hat kernel as opposed to a stock
build.

Mount options are as follows:

lnh-nfs01:/lthfs01 on /home/cluster1 type nfs
(rw,nfsvers=3,proto=tcp,hard,intr,wsize=32768,rsize=32768,addr=10.2.25.19)

The environment is pretty straightforward.  It's generally just a
run-of-the-mill H/A web cluster.  We're utilizing bind mounts in a few
situations, but the problem I'm having manifests itself on machines
both with and without bind mounts.

In summary, it appears that once a file has a negative dcache entry,
it is never revalidated correctly without some sort of intervention.
I've been able to mitigate the problem either by dropping caches more
often than I'd like via /proc/sys/vm/drop_caches, or by stepping into
the parent directory of the 'missing' file and running an 'ls.'  It
appears that the 'ls' triggers an invalidation of the parent directory
(which is what we're looking for initially).

To trigger the issue:

1. Stat a file that we know is non-existant. This populates the dentry
cache with a negative entry.

[root@lnh-util ~]# ssh root@lnh-www1a-mgmt
"stat /home/cluster1/data/f/f/nfsuser01/test_nofile"
stat: cannot stat `/home/cluster1/data/f/f/nfsuser01/test_nofile': No
such file or directory

2. Create that file on a different server, this will also update the
mtime on that parent directory, so the NFS validation code on the dentry
hit ought to catch that.

[root@lnh-util ~]# ssh root@lnh-sshftp1a-mgmt
"touch /home/cluster1/data/f/f/nfsuser01/test_nofile"

3. Try and stat the file again. Still broken.

[root@lnh-util ~]# ssh root@lnh-www1a-mgmt
"stat /home/cluster1/data/f/f/nfsuser01/test_nofile"
stat: cannot stat `/home/cluster1/data/f/f/nfsuser01/test_nofile': No
such file or directory

4. Wait at least 60 seconds, just to rule out attribute cache data
(though from reading
the code, it appears that the parent directory is revalidated
regardless in nfs_check_verifier). We're
using defaults.

5. Read the parent directory.

[root@lnh-util ~]# ssh root@lnh-www1a-mgmt
"ls /home/cluster1/data/f/f/nfsuser01/ | wc -l"
16

6. And now the missing file is present.

[root@lnh-util ~]# ssh root@lnh-www1a-mgmt
"stat /home/cluster1/data/f/f/nfsuser01/test_nofile"
  File: `/home/cluster1/data/f/f/nfsuser01/test_nofile'
  Size: 0               Blocks: 0          IO Block: 4096   regular
empty file
Device: 15h/21d Inode: 4046108346  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2008-05-28 10:07:28.963000000 -0400
Modify: 2008-05-28 10:07:28.963000000 -0400
Change: 2008-05-28 10:07:28.963000000 -0400
[root@lnh-util ~]#

The negative file entry continuously stays present.  This is true even
when a stat of the parent directory shows that the cached attributes
have timed out and we've updated mtime data.

Thoughts?

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux