Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types

Amir Goldstein <amir73il@xxxxxxxxx> · Wed, 20 Sep 2023 18:12:00 +0300

On Wed, Sep 20, 2023 at 4:48 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Wed 20-09-23 15:41:06, Amir Goldstein wrote:
> > On Wed, Sep 20, 2023 at 2:04 PM Jan Kara <jack@xxxxxxx> wrote:
> > > On Wed 20-09-23 11:26:38, Amir Goldstein wrote:
> > > > Be that as it may, there may be users that use inotify on network fs
> > > > and it even makes a lot of sense in controlled environments with
> > > > single NFS client per NFS export (e.g. home dirs), so I think we will
> > > > need to support those fs as well.
> > >
> > > Fair enough.
> > >
> > > > Maybe the wise thing to do is to opt-in to monitor those fs after all?
> > > > Maybe with explicit opt-in to watch a single fs, fanotify group will
> > > > limit itself to marks on a specific sb and then a null fsid won't matter?
> > >
> > > We have virtual filesystems with all sorts of missing or weird notification
> > > functionality and we don't flag that in any way. So making a special flag
> > > for network filesystems seems a bit arbitrary. I'd just make them provide
> > > fsid and be done with it.
> > >
> >
> > OK. I will try.
> >
> > However, in reply to Jeff's comment:
> >
> > > Caution here. Most of these filesystems don't have protocol support for
> > > anything like inotify (the known exception being SMB). You can monitor
> > > such a network filesystem, but you won't get events for things that
> > > happen on remote hosts.
> >
> > Regardless of the fsid question, when we discussed remote notifications
> > support for FUSE/SMB, we raised the point that which notifications the
> > user gets (local or remote) are ambiguous and one suggestion was to
> > be explicit about requesting LOCAL or REMOTE notifications (or both).
> >
> > Among the filesystems that currently support fanotify, except for the
> > most recent kernfs family, I think all of them are "purely local".
> > For the purpose of this discussion I consider debugfs and such to have
> > REMOTE notifications, which are not triggered from user vfs syscalls.
>
> Well, now you are speaking of FAN_REPORT_FID mode. There I agree we
> currently support only filesystems with a sane behavior. But there are
> definitely existing users of fanotify in "standard" file-descriptor mode
> for filesystems such as sysfs, proc, etc. So the new flag would have to
> change behavior only to FAN_REPORT_FID groups and that's why I think it's a
> bit odd.
>

No flag is fine by me.

> > The one exception is smb, but only with config CIFS_NFSD_EXPORT
> > and that one depends on BROKEN.
> >
> > If we did want to require an explicit FAN_LOCAL_NOTIF flag to allow
> > setting marks on fs which may have changes not via local syscalls,
> > it may be a good idea to flag those fs and disallow them without explicit
> > opt-in, before we add fanotify support to fs with missing notifications?
> > Perhaps before the official release of 6.6?
>
> Yeah, overall I agree it would be nice to differentiate filesystems where
> we aren't going to generate all the events. But as I write above, there are
> already existing users for non-FID mode so we cannot have that flag in the
> general setting.
>
> > > > > configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore,
> > > > > ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is
> > > > >   quite limited. But some special cases could be useful. Adding fsid support
> > > > >   is the same amount of trouble as for kernfs - a few LOC. In fact, we
> > > > >   could perhaps add a fstype flag to indicate that this is a filesystem
> > > > >   without persistent identification and so uuid should be autogenerated on
> > > > >   mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid.
> > > > >   This way we could handle all these filesystems with trivial amount of
> > > > >   effort.
> > > > >
> > > >
> > > > Christian,
> > > >
> > > > I recall that you may have had reservations on initializing s_uuid
> > > > and f_fsid in vfs code?
> > > > Does an opt-in fstype flag address your concerns?
> > > > Will you be ok with doing the tmpfs/kernfs trick for every fs
> > > > that opted-in with fstype flag in generic vfs code?
> > > >
> > > > > freevxfs - the only real filesystem without f_fsid. Trivial to handle one
> > > > >   way or the other.
> > > > >
> > > >
> > > > Last but not least, btrfs subvolumes.
> > > > They do have an fsid, but it is different from the sb fsid,
> > > > so we disallow (even inode) fanotify marks.
> > > >
> > > > I am not sure how to solve this one,
> > > > but if we choose to implement the opt-in fanotify flag for
> > > > "watch single fs", we can make this problem go away, along
> > > > with the problem of network fs fsid and other odd fs that we
> > > > do not want to have to deal with.
> > > >
> > > > On top of everything, it is a fast solution and it doesn't
> > > > involve vfs and changing any fs at all.
> > >
> > > Yeah, right, forgot about this one. Thanks for reminding me. But this is
> > > mostly a kernel internal implementation issue and doesn't seem to be a
> > > principial problem so I'd prefer not to complicate the uAPI for this. We
> > > could for example mandate a special super_operation for fetching fsid for a
> > > dentry for filesystems which don't have uniform fsids over the whole
> > > filesystem (i.e., btrfs) and call this when generating event for such
> > > filesystems. Or am I missing some other complication?
> > >
> >
> > The problem is the other way around :)
> > btrfs_statfs() takes a dentry and returns the fsid of the subvolume.
> > That is the fsid that users will get when querying the path to be marked.
>
> Yup.
>
> > If users had a flag to statfs() to request the "btrfs root volume fsid",
> > then fanotify could also report the root fsid and everyone will be happy
> > because the btrfs file handle already contains the subvolume root
> > object id (FILEID_BTRFS_WITH_PARENT_ROOT), but that is not
> > what users get for statfs() and that is not what fanotify documentation
> > says about how to query fsid.
> >
> > We could report the subvolume fsid for marked inode/mount
> > that is not a problem - we just cache the subvol fsid in inode/mount
> > connector, but that fsid will be inconsistent with the fsid in the sb
> > connector, so the same object (in subvolume) can get events
> > with different fsid (e.g. if one event is in mask of sb and another
> > event is in mask of inode).
>
> Yes. I'm sorry I didn't describe all the details. My idea was to report
> even on a dentry with the fsid statfs(2) would return on it. We don't want
> to call dentry_statfs() on each event (it's costly and we don't always have
> the dentry available) but we can have a special callback into the
> filesystem to get us just the fsid (which is very cheap) and call *that* on
> the inode on which the event happens to get fsid for the event. So yes, the
> sb mark would be returning events with different fsids for btrfs. Or we
> could compare the obtained fsid with the one in the root volume and ignore
> the event if they mismatch (that would be more like the different subvolume
> => different filesystem point of view and would require some more work on
> fanotify side to remember fsid in the sb mark and not in the sb connector).
>

It sounds like a big project.
I am not sure it is really needed.

On second thought, maybe getting different events on the
same subvol with different fsid is not that bad, because for
btrfs, it is possible to resolve the path of an fid in subvol
from either the root mount or the subvol mount.
IOW, subvol_fsid+fid and root_fsid+fid are two ways to
describe the same unique object.

Remember that we have two use cases for fsid+fid:
1. (unpriv/priv) User queries fsid+fid, sets an inode mark on path,
    stores fsid+fid<->path in a map to match events to path later
2. (priv-only) User queries fsid, sets a sb/mount mark on path,
    stores fsid<->path to match event to mntfd and
    resolves path by handle from mntfd+fid

Suppose we only allow to set sb marks on btrfs root volume,
but we relax the rule to allow btrfs inode/mount mark on subvol.

Use case #1 is not a problem.
If we prefer inode conn->fsid, to sb/mount conn->fsid, whichever
conn->fsid that is chosen, reports an fsid+fid that user has
recorded the path for, even if sb/mount marks also exist.

In use case #2,
- If the mark was on root sb, then the user has
  the root volume mntfd mapped to root fsid
- If the mark was on subvol mount, then the user has
  the subvol mntfd mapped to subvol fsid
- If the user has both marks it also has both mntfds,
  so it does not matter if events carry the root fsid
  (event is in the root sb mark mask) or if they carry the
  subvol fsid (event is in the subvol mount mark mask)

This needs a POC, but I think it should work.
In any case I can fix the btrfs inode marks.

Thanks,
Amir.