On Sun, Jul 17, 2011 at 11:31 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Now, I do agree that maybe that case simply should check the dentry > sequence count. I wish all cases did. Hugh patch did that. But the > reason I dislike Hugh's patch is that when I say "I wish they all > did", I mean that I dislike the special casing. And Hugh's patch just > adds *more* special casing for that NULL entry - I'd wish we just > always did it regardless of whether it was NULL or not. Btw, looking at that, I think Hugh's patch is wrong. It does if (!read_seqcount_retry(&dentry->d_seq, nd->seq)) but that's after we've done the __follow_mount_rcu() that may actually have changed "nd->seq" to the mount-point inode (and has changed path->dentry to match it). Now, it only does it if inode is NULL, so I guess it doesn't matter, but it's the kind of inconsistency that I think is really dangerous, because it basically compares incompatible sequence numbers. Also, looking at that whole mount-point traversal sequence, it looks like __follow_mount_rcu() will happily totally ignore the old sequence number when it replaces it with the mount-point sequence number. So it looks to me that we have a case where we miss the sequence number check that can happen with a positive dentry too! No? So I think that whenever we change "nd->seq", we should always heck the previous sequence number first (the way do_lookup() itself does for the *normal* traversal case). Otherwise we will have traversed the mount-point without ever having checked the previous sequence number. Something like the (untested) attached patch. Comments? This mount-point case is independent of the negative dentry issue, and probably never really an issue in practice, but... Linus
fs/namei.c | 13 ++++++++++++- 1 files changed, 12 insertions(+), 1 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 5c867dd1c0b3..f78d2af3aef4 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -938,6 +938,8 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path, { for (;;) { struct vfsmount *mounted; + unsigned int seq; + /* * Don't forget we might have a non-mountpoint managed dentry * that wants to block transit. @@ -952,9 +954,18 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path, mounted = __lookup_mnt(path->mnt, path->dentry, 1); if (!mounted) break; + seq = read_seqcount_begin(&mounted->mnt_root->d_seq); + + /* + * The memory barrier in read_seqcount_begin() is sufficient, + * so we can use __read_seqcount_retry() to check the prev + * sequence numbers. + */ + if (!__read_seqcount_retry(&path->dentry->d_seq, nd->seq)) + return false; path->mnt = mounted; path->dentry = mounted->mnt_root; - nd->seq = read_seqcount_begin(&path->dentry->d_seq); + nd->seq = seq; } return true; }