Re: [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify

Amir Goldstein <amir73il@xxxxxxxxx> · Thu, 20 Apr 2023 09:12:52 +0300

On Wed, Apr 19, 2023 at 8:19 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
> On Tue, Apr 18, 2023 at 06:20:22PM +0300, Amir Goldstein wrote:
> > On Tue, Apr 18, 2023 at 5:12 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Apr 18, 2023 at 04:56:40PM +0300, Amir Goldstein wrote:
> > > > On Tue, Apr 18, 2023 at 4:33 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:

[...]

> > > > Just thought of another reason:
> > > >  c) FAN_UNMOUNT does not need to require FAN_REPORT_FID
> > > >      so it does not depend on filesystem having a valid f_fsid nor
> > > >      exports_ops. In case of "pseudo" fs, FAN_UNMOUNT can report
> > > >      only MNTID record (I will amend the patch with this minor change).
> > >
> > > I see some pseudo fses generate f_fsid, e.g., tmpfs in mm/shmem.c
> >
> > tmpfs is not "pseudo" in my eyes, because it implements a great deal of the
> > vfs interfaces, including export_ops.
>
> The term "pseudo" is somewhat well-defined though, no? It really just
> means that there's no backing device associated with it. So for example,
> anything that uses get_tree_nodev() including tmpfs. If erofs is
> compiled with fscache support it's even a pseudo fs (TIL).
>

Ok, "pseudo fs" is an ambiguous term.

For the sake of this discussion, let's refer to fs that use get_tree_nodev()
"non-disk fs".

But as far as fsnotify is concerned, tmpfs is equivalent to xfs, because
all of the changes are made by users via vfs.

Let's call fs where changes can occur not via vfs "remote fs", those
include the network fs and some "internal fs" like the kernfs class of fs
and the "simple fs" class of fs (i.e. simple_fill_super).

With all the remote fs, the behavior of fsnotify is (and has always been)
undefined, that is, you can use inotify to subscribe for events and you
never know what you will get when changes are not made via vfs.

Some people (hypothetical) may expect to watch nsfs for dying ns
and may be disappointed to find out that they do not get the desired
IN_DELETE event.

We have had lengthy discussions about remote fs change notifications
with no clear decisions of the best API for them:
https://lore.kernel.org/linux-fsdevel/20211025204634.2517-1-iangelak@xxxxxxxxxx/

> >
> > and also I fixed its f_fsid recently:
> > 59cda49ecf6c shmem: allow reporting fanotify events with file handles on tmpfs
>
> Well thank you for that this has been very useful in userspace already
> I've been told.
>
> >
> > > At the risk of putting my foot in my mouth, what's stopping us from
> > > making them all support f_fsid?
> >
> > Nothing much. Jan had the same opinion [1].
>
> I think that's what we should try to do without having thought too much
> about potential edge-cases.
>
> >
> > We could do either:
> > 1. use uuid_to_fsid() in vfs_statfs() if fs has set s_uuid and not set f_fsid
> > 2. use s_dev as f_fsid in vfs_statfs() if fs did not set f_fsid nor s_uuid
> > 3. randomize s_uuid for simple fs (like tmpfs)
> > 4. any combination of the above and more
> >
> > Note that we will also need to decide what to do with
> > name_to_handle_at() for those pseudo fs.
>
> Doing it on the fly during vfs_statfs() feels a bit messy and could
> cause bugs. One should never underestimate the possibility that there's
> some fs that somehow would get into trouble because of odd behavior.
>
> So switching each fs over to generate a s_uuid seems the prudent thing
> to do. Doing it the hard way also forces us to make sure that each
> filesystem can deal with this.
>
> It seems that for pseudo fses we can just allocate a new s_uuid for each
> instance. So each tmpfs instance - like your patch did - would just get
> a new s_uuid.
>
> For kernel internal filesystems - mostly those that use init_pseudo -
> the s_uuid would remain stable until the next reboot when it is
> regenerated.
>

I am fine with opt-in for every fs as long as we do not duplicate
boilerplate code.
An FS_ flag could be a simple way to opt-in for this generic behavior.

> Looking around just a little there's some block-backed fses like fat
> that have an f_fsid but no s_uuid. So if we give those s_uuid then it'll
> mean that the f_fsid isn't generated based on the s_uuid. That should be
> ok though and shouldn't matter to userspace.
>
> Afterwards we could probably lift the ext4 and xfs specific ioctls to
> retrieve the s_uuid into a generic ioctl to allow userspace to get the
> s_uuid.
>
> That's my thinking without having crawled to all possible corner
> cases... Also needs documenting that s_uuid is not optional anymore and
> explain the difference between pseudo and device-backed fses. I hope
> that's not completely naive...
>

I don't think that the dichotomy of device-backed vs. pseudo is enough
to describe the situation.

I think what needs to be better documented and annotated is what type
of fsnotify services can be expected to work on a given fs.

Jan has already introduced FS_DISALLOW_NOTIFY_PERM to opt-out
of permission events (for procfs).

Perhaps this could be generalized to s_type->fs_notify_supported_events
or s_type->fs_notify_supported_features.

For example, if an fs opts-in to FAN_REPORT_FID, then it gets an auto
allocated s_uuid and f_fsid if it did not fill them in fill_super and in statfs
and it gets a default implementation for encoding file handles from ino/gen.

Thanks,
Amir.