On Mon, Jul 18, 2011 at 2:19 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > __d_lookup_rcu() is being careful about *inode, yes. > > But I'd forgotten it was even setting it: doesn't that setting get > overridden later by the more careless *inode = path->d_entry->d_inode > at the head of __follow_mount_rcu()'s loop? > > Perhaps that line just needs to be moved to the tail of the loop? Ahh. Bingo. Yes, I think you found it. I don't think it should touch that *inode value in __follow_mount_rcu() unless we actually followed a mount, exactly because it will overwrite the thing that we were so careful about in __d_lookup_rcu(). So how about this patch that replaces the earlier mount-point sequence number one. The only difference is (as you mention) to just do the *inode update at the end of the loop, so that we don't overwrite the valid inode data with a non-checked one when we don't do anything. Untested. But this should make my propised change to fs/dcache.c be irrelevant, because whether we clear d_inode or not, the existing sequence number checks will catch it. Agreed? Linus
fs/namei.c | 21 +++++++++++++++++++-- 1 files changed, 19 insertions(+), 2 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 5c867dd1c0b3..b9cd558a00be 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -938,11 +938,12 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path, { for (;;) { struct vfsmount *mounted; + unsigned int seq; + /* * Don't forget we might have a non-mountpoint managed dentry * that wants to block transit. */ - *inode = path->dentry->d_inode; if (unlikely(managed_dentry_might_block(path->dentry))) return false; @@ -952,9 +953,25 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path, mounted = __lookup_mnt(path->mnt, path->dentry, 1); if (!mounted) break; + seq = read_seqcount_begin(&mounted->mnt_root->d_seq); + + /* + * The memory barrier in read_seqcount_begin() is sufficient, + * so we can use __read_seqcount_retry() to check the prev + * sequence numbers. + */ + if (!__read_seqcount_retry(&path->dentry->d_seq, nd->seq)) + return false; path->mnt = mounted; path->dentry = mounted->mnt_root; - nd->seq = read_seqcount_begin(&path->dentry->d_seq); + nd->seq = seq; + + /* + * Update the inode too. We don't need to re-check the + * dentry sequence number here after this d_inode read, + * because a mount-point is always pinned. + */ + *inode = path->dentry->d_inode; } return true; }