Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

Filip Štědronský <r.lkml@xxxxxxxxxx> · Tue, 14 Mar 2017 15:58:01 +0100

Hi,

On Tue, Mar 14, 2017 at 01:18:01PM +0200, Amir Goldstein wrote:
> I claim that fanotify filters event by mount not because it
> was a requirement, but because it was an implementation challenge
> to do otherwise.
>
> And I claim that what mount watchers are really interested in is
> "all the changes that happen in the file system in the area
>  that is visible to me through this mount point".
>
> In other words, an indexer needs to know if files were modified\
> create/deleted if that indexer sits in container host namespace
> regardless if those files were modified from within a container
> namespace.
> 
> It's not a matter of security/isolation. It's a matter of functionality.
> I agree that for some event (e.g. permission events) it is possible
> to argue both ways (i.e. that the namespace context should be used
> as a filter for events).
> But for the new proposed events (FS_MODIFY_DIR), I really don't
> see the point in isolation by mount/namespace.

there are basically two classes of uses for a fantotify-like
interface:

(1) Keeping an up-to-date representation of the file system.
    For this, superblock watches are clearly what you want.

      * You are interested to know the current state of the
        filesystem so you need to know about every change, 
        regardless of where it came from.
      * As I mentioned earlier, in case of remote, ditributed
        and virtual filesystems, the change might come from
        within the filesystem itself (if the protocol supports
        reporting such changes). This can probably be
        implemented only with superblock-scoped watches because
        the change is fundamentally not related to any mount.
      * Some filesystems might also support change journalling
        and it might be concievable to extend the API in the
        future to report "past" events (for example by passing
        sequence number of last seen event or similar).
      * The argument about containers escaping change notification
        you mentioned earlier.

    All those factors speak greatly in favour of superblock
    watches.

(2) Tracking filesystem *activity*. Now you are not building
    an image of current filesystem state but rather a log of
    what happened. Perhaps you are also interested in who
    (user/process/...) did what. Permission events also fit
    mostly in this category.

    For those it *might* make sense to have mount-scoped
    watches, for example if you want to monitor only one
    container or a subset of processes.

We both concentrate on the first but we shouldn't forget about
the second, which was one of the original motivations for
fanotify.

Thus I conclude that it might be desirable to implement
mount-scoped filename events in the long run. Even though
I agree that the sb-scoped events are more important because
they cover more use cases and you can do additional filtering
(e.g. by pid) if deemed necessary.

This would require:

(a) Sprinkling the callers of vfs_* with fanotify calls
    as I did, or
(b) Creating wrapper functions like vfs_path_unlink & co.
    that would make the necessary fanotify call (and probably
    tell the lower function not to generate another
    notification), as I suggested earlier.
(c) Give the vfs_* functions an *optional* vfsmount argument.

In the end I probably find (c) the most elegant but this
can be discussed later, even after your changes are merged.

Filip