On Sat, Mar 18, 2023 at 04:51:59PM +0100, Christian Brauner wrote: > The comment on top of __lookup_mnt() states that it finds the first > mount implying that there could be multiple mounts mounted at the same > dentry with the same parent. > > This was true on old kernels where __lookup_mnt() could encounter a > stack of child mounts such that each child had the same parent mount and > was mounted at the same dentry. These were called "shadow mounts" and > were created during mount propagation. So back then if a mount @m in the > destination propagation tree already had a child mount @p mounted at > @mp then any mount @n we propagated to @m at the same @mp would be > appended after the preexisting mount @p in @mount_hashtable. > > This hasn't been the case for quite a while now and I don't see an > obvious way how such mount stacks could be created in another way. Not quite, actually - there's a nasty corner case where mnt_change_mountpoint() would create those. And your subsequent patch steps into the same fun. Look: suppose the root of the tree you are feeding to attach_recursive_mnt() has managed to grow a mount right on top of its root. The same will be reproduced in all its copies created by propagate_mnt(). Now, suppose one of the slaves of the place where we are trying to mount it on already has something mounted on it. Well, we hit this: q = __lookup_mnt(&child->mnt_parent->mnt, child->mnt_mountpoint); if (q) mnt_change_mountpoint(child, smp, q); which will tuck the child (a copy we'd made) under q (existing mount on top of the place that copy is for). Result: 'q' overmounts the root of 'child' now. So does the copy of whatever had been overmounting the root of 'source_mnt'... And yes, it can happen. Consider e.g. do_loopback(); we have looked up the mountpoint ('path'), we have looked up the subtree to copy ('old_path'), we had lock_mount(path) made sure that namespace_sem is held *and* path is not overmounted (followed into whatever overmounts that might have happened since we looked the mountpoint up). Now, think what happens if 'old_path' is also overmounted while we are trying to get namespace_sem... A similar scenario exists for do_move_mount(), and there it's in a sense worse - there we have to cope with the possibility that from_dfd is an O_PATH descriptor created by fsmount(). And I'm not at all convinced that we can't arrange for automount point to be there and be triggered by the time of move_mount(2)... The reason it's worse is that here we can't just follow mounts all the way down - we want to take the entire mount tree associated with that descriptor. > And > if that's possible it would invalidate assumptions made in other parts > of the code. Details? I'm not saying it's impossible - we might have a real bug in that area.