On Mon, Jul 04, 2022 at 10:36:05AM -0700, Linus Torvalds wrote: > For example, in __follow_mount_rcu(), when we jump to a new mount > point, and that sequence has > > *seqp = read_seqcount_begin(&dentry->d_seq); > > to reset the sequence number to the new path we jumped into. > > But I don't actually see what checks the previous sequence number in > that path. We just reset it to the new one. Theoretically it could be a problem. We have /mnt/foo/bar and /mnt/baz/bar. Something's mounted on /mnt/foo, hiding /mnt/foo/bar. We start a pathwalk for /mnt/baz/bar, someone umounts /mnt/foo and swaps /mnt/foo to /mnt/baz before we get there. We are doomed to get -ECHILD from an attempt to legitimize in the end, no matter what. However, we might get a hard error (-ENOENT, for example) before that, if we pick up the old mount that used to be on top of /mnt/foo (now /mnt/baz) and had been detached before the damn thing had become /mnt/baz and notice that there's no "bar" in its root. It used to be impossible (rename would've failed if the target had been non-empty and had we managed to empty it first, well, there's your point when -ENOENT would've been accurate). With exchange... Yes, it's a possible race. Might need to add if (read_seqretry(&mount_lock, nd->m_seq)) return false; in there. And yes, it's a nice demonstration of how subtle and brittle RCU pathwalk is - nobody noticed this bit of fun back when RENAME_EXCHANGE had been added... It got a lot more readable these days, but... > For __follow_mount_rcu it looks like validating the previous sequence > number is left to the caller, which then does try_to_unlazy_next(). Not really - the caller goes there only if we have __follow_mount_rcu() say "it's too tricky for me, get out of RCU mode and deal with it there". Anyway, I've thrown a mount_lock check in there, running xfstests to see how it goes...