On Tue, Dec 03, 2024 at 05:42:04PM +0100, Jan Kara wrote: > On Tue 03-12-24 14:03:24, Amir Goldstein wrote: > > On Tue, Dec 3, 2024 at 12:40 PM Karel Zak <kzak@xxxxxxxxxx> wrote: > > > Thank you for working on this. > > > > > > On Thu, Nov 28, 2024 at 03:39:59PM GMT, Miklos Szeredi wrote: > > > > To monitor an entire mount namespace with this new interface, watches need > > > > to be added to all existing mounts. This can be done by performing > > > > listmount()/statmount() recursively at startup and when a new mount is > > > > added. > > > > > > It seems that maintaining a complete tree of nodes on large systems > > > with thousands of mountpoints is quite costly for userspace. It also > > > appears to be fragile, as any missed new node (due to a race or other > > > reason) would result in the loss of the ability to monitor that part > > > of the hierarchy. Let's imagine that there are new mount nodes added > > > between the listmount() and fanotify_mark() calls. These nodes > > > will be invisible. > > > > That should not happen if the monitor does: > > 1. set fanotify_mark() on parent mount to get notified on new child mounts > > 2. listmount() on parent mount to list existing children mounts > > Right, that works in principle. But it will have all those headaches as > trying to do recursive subtree watching with inotify directory watches > (mounts can also be moved, added, removed, etc. while we are trying to > capture them). It is possible to do but properly handling all the possible > races was challenging to say the least. That's why I have my doubts whether > this is really the interface we want to offer to userspace... > > > > It would be beneficial to have a "recursive" flag that would allow for > > > opening only one mount node and receiving notifications for the entire > > > hierarchy. (I have no knowledge about fanotify, so it is possible that > > > this may not be feasible due to the internal design of fanotify.) > > > > This can be challenging, but if it is acceptable to hold the namespace > > mutex while setting all the marks (?) then maybe. > > So for mounts, given the relative rarity of mount / umount events and depth > of a mount tree (compared to the situation with ordinary inodes and > standard fanotify events), I think it might be even acceptable to walk up > the mount tree and notify everybody along that path. Mount trees can get pretty massive due to containers and mount propagation. That's why propagate_umount() is so ugly because it's optimized to deal with such cases. But, I think that recursive watches have to be restricted to mount namespaces anyway. Such that you can get notifications about all mount and umounts in a specific mount namespace. That reigns in the problem quite a bit. > > > What should be possible is to set a mark on the mount namespace > > to get all the mount attach/detach events in the mount namespace > > and let userspace filter out the events that are not relevant to the > > subtree of interest. Yes, that's what I've been arguing for at LSFMM.