On Wed, Dec 11, 2024 at 11:21:08AM +0100, Miklos Szeredi wrote: > On Wed, 11 Dec 2024 at 11:00, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > On Tue, Dec 10, 2024 at 04:10:45PM +0100, Miklos Szeredi wrote: > > > On Sat, 7 Dec 2024 at 22:17, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > > > > I took another look at f{a,s}notify. There's no copy_to_user() happening > > > > when adding events via fsnotify(). It happens when the caller retrieves > > > > events via read() from the relevant notify file descriptor. We should > > > > still move calls to notify_mounts() out of the namespace semaphore > > > > whenever we can though. > > > > > > Doesn't work. After unlocking namespace_sem deref of mnt->prev_ns > > > might lead to UAF. > > > > Hm, a UAF could only be triggered by mounts that were unmounted due to > > umount propagation into another mount namespaces. The caller's mount > > namespace in mnt_ns->prev_ns cannot go away until all mounts are put. > > Why? E.g. one does umount -l on a subtree in a private namespace, > then destroys the namespace immediately. There's no serialization > between the two other than namespace_sem, so if the former releases > namespace_sem the namespace destruction can run to completion while > the detached subtree's mounts are still being processed. For that the caller has to exit or switch to another mount namespace. But that can only happen when all notifications have been registered. I may misunderstand what you mean though. > > The simple fix is to take a passive reference count. But I'm not sure > > what would be more expensive (holding the lock or the reference counts). > > Right, that would work, but I think holding namespace_sem for read > while calling fsnotify() is both simpler and more efficient. Probably, although I'm still not too happy about it. Especially since umount propagation can generate a lot more events then mount propagation as it tries to be exhaustive. I guess we have to see. Would be nice to have proper test for this.