Re: [RFC][PATCH] fanotify: disallow mount/sb marks on kernel internal pseudo fs

Christian Brauner <brauner@xxxxxxxxxx> · Mon, 3 Jul 2023 10:27:59 +0200

On Sat, Jul 01, 2023 at 07:25:14PM +0300, Amir Goldstein wrote:
> On Fri, Jun 30, 2023 at 10:29 AM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> >
> > On Thu, Jun 29, 2023 at 07:20:44AM +0300, Amir Goldstein wrote:
> > > Hopefully, nobody is trying to abuse mount/sb marks for watching all
> > > anonymous pipes/inodes.
> > >
> > > I cannot think of a good reason to allow this - it looks like an
> > > oversight that dated back to the original fanotify API.
> > >
> > > Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> > > Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > > ---
> > >
> > > Jan,
> > >
> > > As discussed, allowing sb/mount mark on anonymous pipes
> > > makes no sense and we should not allow it.
> > >
> > > I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> > > backport to maintained LTS kernels event though this dates back to day one
> > > with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
> > >
> > > The reason this is an RFC and that I have not included also the
> > > optimization patch is because we may want to consider banning kernel
> > > internal inodes from fanotify and/or inotify altogether.
> > >
> > > The tricky point in banning anonymous pipes from inotify, which
> > > could have existing users (?), but maybe not, so maybe this is
> > > something that we need to try out.
> > >
> > > I think we can easily get away with banning anonymous pipes from
> > > fanotify altogeter, but I would not like to get to into a situation
> > > where new applications will be written to rely on inotify for
> > > functionaly that fanotify is never going to have.
> > >
> > > Thoughts?
> > > Am I over thinking this?
> > >
> > > Amir.
> > >
> > >  fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > >
> > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > index 95d7d8790bc3..8240a3fdbef0 100644
> > > --- a/fs/notify/fanotify/fanotify_user.c
> > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > @@ -1622,6 +1622,20 @@ static int fanotify_events_supported(struct fsnotify_group *group,
> > >           path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
> > >               return -EINVAL;
> > >
> > > +     /*
> > > +      * mount and sb marks are not allowed on kernel internal pseudo fs,
> > > +      * like pipe_mnt, because that would subscribe to events on all the
> > > +      * anonynous pipes in the system.
> >
> > s/anonynous/anonymous/
> >
> > > +      *
> > > +      * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
> > > +      * are not exposed to user's mount namespace, but there are other
> > > +      * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
> > > +      * allowing sb and mount mark is questionable.
> > > +      */
> > > +     if (mark_type != FAN_MARK_INODE &&
> > > +         path->mnt->mnt_sb->s_flags & SB_NOUSER)
> > > +             return -EINVAL;
> >
> 
> On second thought, I am not sure about  the EINVAL error code here.
> I used the same error code that Jan used for permission events on
> proc fs, but the problem is that applications do not have a decent way
> to differentiate between
> "sb mark not supported by kernel" (i.e. < v4.20) vs.
> "sb mark not supported by fs" (the case above)
> 
> same for permission events:
> "kernel compiled without FANOTIFY_ACCESS_PERMISSIONS" vs.
> "permission events not supported by fs" (procfs)
> 
> I have looked for other syscalls that react to SB_NOUSER and I've
> found that mount also returns EINVAL.
> 
> So far, fanotify_mark() and fanotify_init() mostly return EINVAL
> for invalid flag combinations (also across the two syscalls),
> but not because of the type of object being marked, except for
> the special case of procfs and permission events.
> 
> mount(2) syscall OTOH, has many documented EINVAL cases
> due to the type of source object (e.g. propagation type shared).

Many is an understatement. There's so many EINVALs.

> 
> I know there is no standard and EINVAL can mean many
> different things in syscalls, but I thought that maybe EACCES
> would convey more accurately the message:
> "The sb/mount of this fs is not accessible for placing a mark".
> 
> WDYT? worth changing?

I think it's not crazy to use EACCES to let users figure out that the fs
isn't supported. It really depends on how useful that is.

> worth changing procfs also?

Not sure if people would be confused if they got EACCES suddenly but
then again, they could've never used it.

> We don't have that EINVAL for procfs documented in man page btw.