On Mon, 18 Feb 2013 13:25:09 +1100 NeilBrown <neilb@xxxxxxx> wrote: > On Thu, 14 Feb 2013 10:42:30 -0500 Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > On Tue, 12 Feb 2013 11:38:13 +1100 > > NeilBrown <neilb@xxxxxxx> wrote: > > > > > > > > I've been exploring difficulties with unmounting stale directories and > > > discovered another bug. > > > > > > If I: > > > > > > SERVER: mkdir /foo/bar #and make sure it is exported > > > CLIENT: mount -o vers=4 server:/foo/bar /mnt > > > SERVER: rm -r /foo > > > CLIENT: > /mnt/baz # gets an error of course > > > CLIENT: ls -l /mnt # error again > > > CLIENT: umount /mnt > > > > > > The result of that last command is: > > > > > > /mnt was not found in /proc/mounts > > > /mnt was not found in /proc/mounts > > > > > > Strange? > > > > > > cat /proc/mounts > > > > > > ..... > > > 10.0.2.2://foo/bar /mnt\040(deleted) nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.2.15,minorversion=0,local_lock=none,addr=10.0.2.2 0 0 > > > .... > > > > > > Notice the "\040(deleted)". > > > > > > NFS has unhashed that directory because it is obviously bad, and d_path() > > > notices and adds " (deleted)". > > > > > > Now I might be able to argue that NFS shouldn't be unhashing a directory that > > > is a mountpoint - it certainly seems strange behaviour. > > > > > > But I think I can more strongly argue that /proc/mounts shouldn't be showing > > > the mounted directory, but instead the directory that it is mounted on. > > > Obviously these both have the same name so it shouldn't matter ... except > > > that here is a case where it does. > > > > > > I "fixed" it with > > > > > > --- a/fs/proc_namespace.c > > > +++ b/fs/proc_namespace.c > > > @@ -93,7 +93,7 @@ static int show_vfsmnt(struct seq_file *m, struct vfsmount *mnt) > > > { > > > struct mount *r = real_mount(mnt); > > > int err = 0; > > > - struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt }; > > > + struct path mnt_path = { .dentry = r->mnt_mountpoint, .mnt = &(r->mnt_parent)->mnt }; > > > struct super_block *sb = mnt_path.dentry->d_sb; > > > > > > if (sb->s_op->show_devname) { > > > > > > though I suspect that isn't safe and needs some locking. > > > > > > Probably both should be fixed: NFS should not invalidate any mounted > > > directory, and show_vfsmnt() should report the mointpoint, not the mounted > > > directory. > > > > > > I can't figure out any way to get NFS to not invalidate the mounted directory. > > > I think it happens in nfs_lookup_revalidate() when it calls d_drop(), but I > > > don't know how to tell if a given dentry is a mnt_root for any mountpoint. > > > > > > Suggestions? Thoughts? > > > > > > Thanks, > > > NeilBrown > > > > > > > I've also been looking at some weird ESTALE problems. Here's another > > fun one that doesn't involve mountpoints. Assume here that we're > > working in the same exported directory on client and server: > > > > server# mkdir a > > client# cd a > > server# mv a a.bak > > client# sleep 30 # (or whatever the dir attrcache timeout is) > > client# stat . > > stat: cannot stat ‘.’: Stale NFS file handle > > > > Obviously, "." should not be stale. It got renamed, but the inode still > > exists on the server. > > > > If you sniff on the wire, you'll see that the server doesn't ever send > > an ESTALE here. What happens is that due to FS_REVAL_DOT being set, we > > end up trying to revalidate the dentry that "." refers to. We find that > > the parent changed (obviously) and then try to redo the lookup of "a". > > At that point we notice that it doesn't exist and turn it into ESTALE. > > > > I don't really understand the point of FS_REVAL_DOT. What does that > > actually buy us? I wonder if removing it would also help your testcase? > > > > I think that is a slightly different issue, but certainly related. > I have hit your problem before and have the following patch in SLES. I think > I tried pushing it upstream, but didn't get much in the way of a useful > response. > (patch is space-damaged - don't try to apply with 'patch'). > > BTW I have another problem, related to this one and which could be fixed by > removing FS_REVAL_DOT. > > If you > mount -o vers=4,noac server:/some/path /mnt > then stop nfsd on the server and > umount /mnt > > it hangs. > Partly it hangs because 'mount' tries to do a 'readlink' on the mountpoint. > I can probably get it to not do that (or use --no-canonicalize). > But then sys_umount hangs because it tries to check with the server that the > thing being unmounted really is still a directory... > > I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that > works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very > last step and returns the mounted-on directory, not the mountpoint that is > mounted there - or at least makes sure not revalidate happens on that final > mounted directory. > > > I think FS_REVAL_DOT is needed so that if you call stat("."), it will update > attributes from the server if the cache is old. However it seems to do a > whole lot more than that, including "lookup" calls which it I'm sure is wrong. > > > > Subject: [PATCH] nfs - handle d_revalidate of 'dot' correctly. > > When d_revalidate is called on a dentry because FS_REVAL_DOT is set > it isn't really appropriate to revalidate the name. > > If the path was simply ".", then the current-working-directory could > have been renamed on the server and should still be accessible as "." > even if it has a new name. > > If the path was "/some/long/path/.", then the final component ("path" in > this case) has already been revalidated and there is no particular > need to do it again. > > If we change nd->last_type to refer to "the last component looked at" > rather than just "the last component", then these cases can be > detected by "nd->last_type != LAST_NORM". > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > --- > fs/namei.c | 2 +- > fs/nfs/dir.c | 9 +++++++++ > 2 files changed, 10 insertions(+), 1 deletion(-) > > --- linux-3.0-SLE11-SP2.orig/fs/namei.c > +++ linux-3.0-SLE11-SP2/fs/namei.c > @@ -1460,6 +1460,7 @@ static int link_path_walk(const char *na > } > } > > + nd->last_type = type; > /* remove trailing slashes? */ > if (!c) > goto last_component; > @@ -1486,7 +1487,6 @@ last_component: > /* Clear LOOKUP_CONTINUE iff it was previously unset */ > nd->flags &= lookup_flags | ~LOOKUP_CONTINUE; > nd->last = this; > - nd->last_type = type; > return 0; > } > terminate_walk(nd); > --- linux-3.0-SLE11-SP2.orig/fs/nfs/dir.c > +++ linux-3.0-SLE11-SP2/fs/nfs/dir.c > @@ -1138,6 +1138,15 @@ static int nfs_lookup_revalidate(struct > if (NFS_STALE(inode)) > goto out_bad; > > + if (nd->last_type != LAST_NORM) { > + /* name not relevant, just inode */ > + error = nfs_revalidate_inode(NFS_SERVER(inode), inode); > + if (error) > + goto out_bad; > + else > + goto out_valid; > + } > + > error = -ENOMEM; > fhandle = nfs_alloc_fhandle(); > fattr = nfs_alloc_fattr(); Ahh thanks -- that is the same problem exactly. I'll have to look over your patch and see whether and how it could be applied to current mainline code. I think the umount thing may be the same problem that steved was talking about the other day (V4 unmount causes a GETATTR). I hadn't put the two together, but you're probably right. LOOKUP_MOUNTPOINT is a very interesting idea and might even be reasonable in conjunction with removing FS_REVAL_DOT as it would make the needs of umount more explicit. As far as FS_REVAL_DOT goes though, in the current mainline code, the only place that looks at it is complete_walk(), and it just means that we won't skip doing a d_revalidate on the final component (which will only have not been done if it's a '.', right?). If we remove FS_REVAL_DOT altogether, then we can chop off the bottom half of that function, which has a certain appeal. I guess the question we have to answer is -- if we remove FS_REVAL_DOT, what (if anything) will break? -- Jeff Layton <jlayton@xxxxxxxxxx>
Attachment:
signature.asc
Description: PGP signature