Jeff Layton <jlayton@xxxxxxxxxx> writes: > On Fri, 6 Nov 2009 20:36:01 +0000 > Jamie Lokier <jamie@xxxxxxxxxxxxx> wrote: > >> Jeff Layton wrote: >> > The problem here is that this makes that code shortcut any lookup or >> > revalidation of the dentry. In general, this isn't a problem -- in most >> > cases the dentry is known to be good. It is a problem however for NFSv4. >> > If this symlink is followed on an open operation no actual open call >> > occurs and the open state isn't properly established. This causes >> > problems when we later try to use this file descriptor for actual >> > operations. >> >> As NFS uses open() as a kind of fcntl-lock barrier, I can see it's >> important to do _something_ on new opens, rather than just cloning >> most of the file descriptor. >> > > I guess you mean the close-to-open cache consistency? If so, this > problem doesn't actually break that. The actual nfs_file_open call does > occur even when you're opening by following one of these symlinks. I > believe the cache consistency code occurs there. > > The problem here is really nfsv4 specific. There the on-the-wire open > call and initialization of state actually happens during d_lookup and > d_revalidate. Neither of these happens with these LAST_BIND symlinks so > we end up with a filp that has no NFSv4 state attached. > >> > This patch takes a minimalist approach to fixing this by making the >> > /proc/pid follow_link routine revalidate the dentry before returning it. >> >> What happens if the file descriptor you are re-opening is for a file >> which has been deleted. Does it still have a revalidatable dentry? >> > > Well, these LAST_BIND symlinks return a real dget'ed dentry today. If > we assume that it always returns a valid dentry (which seems to be the > case), then I suppose it's OK to do a d_revalidate against it. > > It's possible though that that revalidate will either fail though or > return that it's no good. In that case, this code just returns ESTALE > which should make the path walking code revalidate all the way up the > chain. That should (hopefully) make whatever syscall we're servicing > return an error. Hmm. Looking at the code I get the impression that a file bind mount will have exactly the same problem. Can you confirm. If file bind mounts also have this problem a bugfix to to just proc seems questionable. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html