Re: [RFC][PATCH] fanotify: disallow mount/sb marks on kernel internal pseudo fs

Christian Brauner <brauner@xxxxxxxxxx> · Tue, 4 Jul 2023 11:58:07 +0200

On Mon, Jul 03, 2023 at 01:25:51PM +0200, Jan Kara wrote:
> On Sat 01-07-23 19:25:14, Amir Goldstein wrote:
> > On Fri, Jun 30, 2023 at 10:29 AM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Jun 29, 2023 at 07:20:44AM +0300, Amir Goldstein wrote:
> > > > Hopefully, nobody is trying to abuse mount/sb marks for watching all
> > > > anonymous pipes/inodes.
> > > >
> > > > I cannot think of a good reason to allow this - it looks like an
> > > > oversight that dated back to the original fanotify API.
> > > >
> > > > Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> > > > Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > > > ---
> > > >
> > > > Jan,
> > > >
> > > > As discussed, allowing sb/mount mark on anonymous pipes
> > > > makes no sense and we should not allow it.
> > > >
> > > > I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> > > > backport to maintained LTS kernels event though this dates back to day one
> > > > with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
> > > >
> > > > The reason this is an RFC and that I have not included also the
> > > > optimization patch is because we may want to consider banning kernel
> > > > internal inodes from fanotify and/or inotify altogether.
> > > >
> > > > The tricky point in banning anonymous pipes from inotify, which
> > > > could have existing users (?), but maybe not, so maybe this is
> > > > something that we need to try out.
> > > >
> > > > I think we can easily get away with banning anonymous pipes from
> > > > fanotify altogeter, but I would not like to get to into a situation
> > > > where new applications will be written to rely on inotify for
> > > > functionaly that fanotify is never going to have.
> > > >
> > > > Thoughts?
> > > > Am I over thinking this?
> > > >
> > > > Amir.
> > > >
> > > >  fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
> > > >  1 file changed, 14 insertions(+)
> > > >
> > > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > > index 95d7d8790bc3..8240a3fdbef0 100644
> > > > --- a/fs/notify/fanotify/fanotify_user.c
> > > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > > @@ -1622,6 +1622,20 @@ static int fanotify_events_supported(struct fsnotify_group *group,
> > > >           path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
> > > >               return -EINVAL;
> > > >
> > > > +     /*
> > > > +      * mount and sb marks are not allowed on kernel internal pseudo fs,
> > > > +      * like pipe_mnt, because that would subscribe to events on all the
> > > > +      * anonynous pipes in the system.
> > >
> > > s/anonynous/anonymous/
> > >
> > > > +      *
> > > > +      * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
> > > > +      * are not exposed to user's mount namespace, but there are other
> > > > +      * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
> > > > +      * allowing sb and mount mark is questionable.
> > > > +      */
> > > > +     if (mark_type != FAN_MARK_INODE &&
> > > > +         path->mnt->mnt_sb->s_flags & SB_NOUSER)
> > > > +             return -EINVAL;
> > >
> > 
> > On second thought, I am not sure about  the EINVAL error code here.
> > I used the same error code that Jan used for permission events on
> > proc fs, but the problem is that applications do not have a decent way
> > to differentiate between
> > "sb mark not supported by kernel" (i.e. < v4.20) vs.
> > "sb mark not supported by fs" (the case above)
> > 
> > same for permission events:
> > "kernel compiled without FANOTIFY_ACCESS_PERMISSIONS" vs.
> > "permission events not supported by fs" (procfs)
> > 
> > I have looked for other syscalls that react to SB_NOUSER and I've
> > found that mount also returns EINVAL.
> 
> We tend to return EINVAL both for invalid (combination of) flags as well as
> for flags applied to invalid objects in various calls. In practice there is
> rarely a difference.
> 
> > So far, fanotify_mark() and fanotify_init() mostly return EINVAL
> > for invalid flag combinations (also across the two syscalls),
> > but not because of the type of object being marked, except for
> > the special case of procfs and permission events.
> > 
> > mount(2) syscall OTOH, has many documented EINVAL cases
> > due to the type of source object (e.g. propagation type shared).
> > 
> > I know there is no standard and EINVAL can mean many
> > different things in syscalls, but I thought that maybe EACCES
> > would convey more accurately the message:
> > "The sb/mount of this fs is not accessible for placing a mark".
> > 
> > WDYT? worth changing?
> > worth changing procfs also?
> > We don't have that EINVAL for procfs documented in man page btw.
> 
> Well, EACCES translates to message "Permission denied" which as Christian
> writes is justifiable but frankly I find it more confusing. Because when I
> get "Permission denied", I go looking which permissions are wrong, perhaps
> suspecting SELinux or other LSM and don't think that object type / location
> is at fault.
> 
> I agree that with EINVAL it is impossible to distinguish "unsupported on
> this object only" vs "completely unknown flag" but it doesn't seem like a
> huge problem for userspace to me as I can think of workarounds even if
> userspace wants to do something else than "report error and bail".

Userspace is pretty used to the flood of EINVAL from the vfs apis so
they often have good workarounds. It doesn't mean it's something we
should just discount ofc. I think having ways to surface more
descriptive errors would overall be a good thing.