Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Mon, 4 Jul 2022 20:02:52 +0100

On Mon, Jul 04, 2022 at 10:36:05AM -0700, Linus Torvalds wrote:

> For example, in __follow_mount_rcu(), when we jump to a new mount
> point, and that sequence has
> 
>                 *seqp = read_seqcount_begin(&dentry->d_seq);
> 
> to reset the sequence number to the new path we jumped into.
> 
> But I don't actually see what checks the previous sequence number in
> that path. We just reset it to the new one.

Theoretically it could be a problem.  We have /mnt/foo/bar and
/mnt/baz/bar.  Something's mounted on /mnt/foo, hiding /mnt/foo/bar.
We start a pathwalk for /mnt/baz/bar,
someone umounts /mnt/foo and swaps /mnt/foo to /mnt/baz before
we get there.  We are doomed to get -ECHILD from an attempt to
legitimize in the end, no matter what.  However, we might get
a hard error (-ENOENT, for example) before that, if we pick up
the old mount that used to be on top of /mnt/foo (now /mnt/baz)
and had been detached before the damn thing had become /mnt/baz
and notice that there's no "bar" in its root.

It used to be impossible (rename would've failed if the target had
been non-empty and had we managed to empty it first, well, there's
your point when -ENOENT would've been accurate).  With exchange...
Yes, it's a possible race.

Might need to add
                                if (read_seqretry(&mount_lock, nd->m_seq))
					return false;
in there.  And yes, it's a nice demonstration of how subtle and
brittle RCU pathwalk is - nobody noticed this bit of fun back when
RENAME_EXCHANGE had been added...  It got a lot more readable these
days, but...

> For __follow_mount_rcu it looks like validating the previous sequence
> number is left to the caller, which then does try_to_unlazy_next().

Not really - the caller goes there only if we have __follow_mount_rcu()
say "it's too tricky for me, get out of RCU mode and deal with it
there".

Anyway, I've thrown a mount_lock check in there, running xfstests to
see how it goes...