Re: [RFC PATCH] fanotify: notify on mount attach and detach

Christian Brauner <brauner@xxxxxxxxxx> · Wed, 4 Dec 2024 12:30:31 +0100

On Tue, Dec 03, 2024 at 05:42:04PM +0100, Jan Kara wrote:
> On Tue 03-12-24 14:03:24, Amir Goldstein wrote:
> > On Tue, Dec 3, 2024 at 12:40 PM Karel Zak <kzak@xxxxxxxxxx> wrote:
> > > Thank you for working on this.
> > >
> > > On Thu, Nov 28, 2024 at 03:39:59PM GMT, Miklos Szeredi wrote:
> > > > To monitor an entire mount namespace with this new interface, watches need
> > > > to be added to all existing mounts.  This can be done by performing
> > > > listmount()/statmount() recursively at startup and when a new mount is
> > > > added.
> > >
> > > It seems that maintaining a complete tree of nodes on large systems
> > > with thousands of mountpoints is quite costly for userspace. It also
> > > appears to be fragile, as any missed new node (due to a race or other
> > > reason) would result in the loss of the ability to monitor that part
> > > of the hierarchy. Let's imagine that there are new mount nodes added
> > > between the listmount() and fanotify_mark() calls. These nodes
> > > will be invisible.
> > 
> > That should not happen if the monitor does:
> > 1. set fanotify_mark() on parent mount to get notified on new child mounts
> > 2. listmount() on parent mount to list existing children mounts
> 
> Right, that works in principle. But it will have all those headaches as
> trying to do recursive subtree watching with inotify directory watches
> (mounts can also be moved, added, removed, etc. while we are trying to
> capture them). It is possible to do but properly handling all the possible
> races was challenging to say the least. That's why I have my doubts whether
> this is really the interface we want to offer to userspace...
> 
> > > It would be beneficial to have a "recursive" flag that would allow for
> > > opening only one mount node and receiving notifications for the entire
> > > hierarchy. (I have no knowledge about fanotify, so it is possible that
> > > this may not be feasible due to the internal design of fanotify.)
> > 
> > This can be challenging, but if it is acceptable to hold the namespace
> > mutex while setting all the marks (?) then maybe.
> 
> So for mounts, given the relative rarity of mount / umount events and depth
> of a mount tree (compared to the situation with ordinary inodes and
> standard fanotify events), I think it might be even acceptable to walk up
> the mount tree and notify everybody along that path.

Mount trees can get pretty massive due to containers and mount
propagation. That's why propagate_umount() is so ugly because it's
optimized to deal with such cases.

But, I think that recursive watches have to be restricted to mount
namespaces anyway. Such that you can get notifications about all mount
and umounts in a specific mount namespace. That reigns in the problem
quite a bit.

> 
> > What should be possible is to set a mark on the mount namespace
> > to get all the mount attach/detach events in the mount namespace
> > and let userspace filter out the events that are not relevant to the
> > subtree of interest.

Yes, that's what I've been arguing for at LSFMM.