Re: fanotify as syscalls

Andreas Gruenbacher <agruen@xxxxxxx> · Thu, 17 Sep 2009 22:07:01 +0200

On Wednesday, 16 September 2009 14:17:08 Jamie Lokier wrote:
> Eric Paris wrote:
> > On Wed, 2009-09-16 at 08:52 +0100, Jamie Lokier wrote:
> > > Seriously, what does system-wide fanotify do when run from a
> > > chroot/namespace/cgroup, and a file outside them is accessed?
> >
> > At the moment an fanotify global listener is system wide.  Truely system
> > wide.  A gentleman from suse is looking rectify the problem so that if
> > run inside a namespace it stays inside the namespace.  Note that this
> > particular little tidbit is not in the 8 patches I proposed.  At the
> > moment those just include the UI and basic notification.
>
> I'll be really interested in the gentleman's solution.

I guess Eric meant me.

>From my point of view, "global" events make no sense, and fanotify listeners 
should register which directories they are interested in (e.g., include "/", 
exclude "/proc"). This takes care of chroots and namespaces as well.

I think we want to register for events on objects rather than in the 
namespace, i.e., for inodes visible in multiple places because of hardlinks 
or bind mounts, we get the same kinds of events no matter which path is used. 
(The path actually used would still show up in /proc/self/fd/x.) When moving 
registered inodes, the registrations would move with them. This is how 
inotify works, except that inotify watches are not recursive.

The difficulty with this is that in the worst case, this would require walking 
the entire namespace and all cached inodes. I don't see how this could be 
done for two reasons:

 * First, we can't take the vfsmount_lock and dcache_lock for the entire time.

 * Second, we would need to pin almost all the inodes, which is a clear no-go.

   [Why pin?  At least we would need to remember which objects a listener has
    registered interest in, so we need to pin the inodes.  We could still
    allow unregistered directory inodes to be thrown out because we can
    recreate their registration status from the parent. We can't recreate the
    registration status of non-directories because of hardlinks, though.]

The only other idea I could come up with is to only allow recursive 
registrations at mount points: instead of inodes, the vfsmounts would be 
included or excluded (probably automatically including bind mounts). This has 
one big drawback though: users would no longer be able to watch arbitrary 
subtrees anymore. Privileged users could still arrange to watch almost all 
subtrees with bind mounts (mount --bind /foo/bar /foo/bar).

Any ideas?

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html