Re: [PATCH v3 08/13] fanotify: enable FAN_REPORT_FID init flag

Amir Goldstein <amir73il@xxxxxxxxx> · Thu, 29 Nov 2018 13:03:03 +0200

On Thu, Nov 29, 2018 at 11:46 AM Jan Kara <jack@xxxxxxx> wrote:
>
> On Sun 25-11-18 15:43:47, Amir Goldstein wrote:
> > When setting up an fanotify listener, user may request to get fid
> > information in event instead of an open file descriptor.
> >
> > The fid obtained with event on a watched object contains the file
> > handle returned by name_to_handle_at(2) and fsid returned by statfs(2).
> >
> > When setting a mark, we need to make sure that the filesystem
> > supports encoding file handles with name_to_handle_at(2) and that
> > statfs(2) encodes a non-zero fsid.
> ...
> > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > index ea8e81a3e80b..d7aa2f392a64 100644
> > --- a/fs/notify/fanotify/fanotify_user.c
> > +++ b/fs/notify/fanotify/fanotify_user.c
> > @@ -857,6 +859,49 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
> >       return fd;
> >  }
> >
> > +/* Check if filesystem can encode a unique fid */
> > +static int fanotify_test_fid(struct path *path)
> > +{
> > +     struct kstatfs stat, root_stat;
> > +     int err;
> > +
> > +     /*
> > +      * Make sure path is not in filesystem with zero fsid (e.g. tmpfs).
> > +      * TODO: cache fsid in the mark connector.
> > +      */
>
> TODO in a submitted patch looks strange (looks like the patch isn't done
> yet ;)) and in this particular case the code is perfectly justified even
> without talking about future functionality...
>

Not to mention that I forgot to remove the TODO in the patch that adds
cached fsid ;-)

> > +     err = vfs_statfs(path, &stat);
> > +     if (err)
> > +             return err;
> > +
> > +     if (!stat.f_fsid.val[0] && !stat.f_fsid.val[1])
> > +             return -ENODEV;
> > +
> > +     /*
> > +      * Make sure path is not inside a filesystem subvolume (e.g. btrfs)
> > +      * which uses a different fsid than sb root.
> > +      */
> > +     err = statfs_by_dentry(path->dentry->d_sb->s_root, &root_stat);
> > +     if (err)
> > +             return err;
> > +
> > +     if (root_stat.f_fsid.val[0] != stat.f_fsid.val[0] ||
> > +         root_stat.f_fsid.val[1] != stat.f_fsid.val[1])
> > +             return -EXDEV;
>
> I think inode watches in subvolumes are actually fine? The same fs object
> is going to get different struct inode for different subvolumes if I
> remember right. So there won't be any surprises with unexpected fsid being
> reported.
>
> Also mount watches are actually fine for subvolume as different subvolumes
> appear as different mountpoints, don't they? And I think implementation
> that would have different fsid for inodes in the same mountpoint would be
> indeed very weird. So again no problem with fsid mismatch.
>
> So we need this check only for superblock watches.
>

Not so simple (or is it?).
If a group has inode, mount and filesystem marks, not all added at the
same time.
When event on an object that is associated with all the above marks,
which cached fsid should be used in the report?
Naturally, it makes sense to prefer to more accurate fsid of mount/inode
over the broader fsid of filesystem. Right?
But what happens when mount/inode marks are removed?
Or if filesystem has events in the mask that inode/mount do not?
Then the same object reports events with different fsid depending
on the type of event and time it took place (which marks existed).

Not a good situation to get ourselves into.

The simple way out of this is: we do not support FAN_REPORT_FID
on marks using path that is not relative to main volume. period.

Considering the fact that FAN_REPORT_FID is mainly indented to
enable reporting directory modification events and mount marks
are not supported with reporting directory modification events, we
only loose the ability to watch modification on selective directories
inside btrfs subvolume.

I also don't like the fact that I disabled filesystem watch over tmpfs,
because for the case of watching a single filesystem or single
directory, which is quite a common case, we don't need fsid
to be non-zero and we don't care if it mismatches with s_root fsid.

A solution I was contemplating was to allow zero fsid and non
root fsid as long as it is the only sb watched by the group, so
for non unique fsid:
- store group->sb and group->fsid
- return -EXDEV for an attempt to add mark from a different sb
(no matter if it is inode/mount/sb mark)
- when trying to add mark with zero or non root fsid (common case)
set group->sb to a special value so no fs will match it and then
attempt to add any mark with zero/non-root fsid will fail

This is something that is quite easy for me to implement and less
easy to document the expected behavior.
I donno, maybe:
EXDEV    watching several filesystems and either new mark or existing marks
                 are on filesystems with non unique fsid

The easy way out of it for me was: no support for FAN_REPORT_FID
on btrfs subvolumes at the moment - it could be added with restrictions
in the future.

Do you have a different view of the problem than mine?

> > diff --git a/fs/statfs.c b/fs/statfs.c
> > index f0216629621d..6a5d840a2d8d 100644
> > --- a/fs/statfs.c
> > +++ b/fs/statfs.c
> > @@ -50,7 +50,8 @@ static int calculate_f_flags(struct vfsmount *mnt)
> >               flags_by_sb(mnt->mnt_sb->s_flags);
> >  }
> >
> > -static int statfs_by_dentry(struct dentry *dentry, struct kstatfs *buf)
> > +/* Does not set buf->f_flags */
> > +int statfs_by_dentry(struct dentry *dentry, struct kstatfs *buf)
> >  {
> >       int retval;
> >
> > @@ -66,6 +67,7 @@ static int statfs_by_dentry(struct dentry *dentry, struct kstatfs *buf)
> >               buf->f_frsize = buf->f_bsize;
> >       return retval;
> >  }
> > +EXPORT_SYMBOL(statfs_by_dentry);
> >
> >  int vfs_statfs(const struct path *path, struct kstatfs *buf)
> >  {
>
> Make this export a separate patch, CC Al Viro on it. Honestly I'm not very
> happy about statfs_by_dentry() interface as it may return different result
> than vfs_statfs() so it just waits for someone careless to use
> statfs_by_dentry() and get burnt. How about implementing vfs_get_fsid()
> that will get dentry and return fsid, that will be just internally
> implemented using statfs_by_dentry()? We can then use that everywhere in
> fsnotify and the interface is going to be much cleaner like that. The
> comment regarding CC to Al Viro and separate patch still applies though.
>

OK. sounds good.

Thanks,
Amir.