Re: [RFC][PATCH] fanotify: disallow mount/sb marks on kernel internal pseudo fs

Jan Kara <jack@xxxxxxx> · Tue, 4 Jul 2023 13:18:33 +0200

On Tue 04-07-23 11:58:07, Christian Brauner wrote:
> On Mon, Jul 03, 2023 at 01:25:51PM +0200, Jan Kara wrote:
> > On Sat 01-07-23 19:25:14, Amir Goldstein wrote:
> > > On Fri, Jun 30, 2023 at 10:29 AM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Jun 29, 2023 at 07:20:44AM +0300, Amir Goldstein wrote:
> > > > > Hopefully, nobody is trying to abuse mount/sb marks for watching all
> > > > > anonymous pipes/inodes.
> > > > >
> > > > > I cannot think of a good reason to allow this - it looks like an
> > > > > oversight that dated back to the original fanotify API.
> > > > >
> > > > > Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> > > > > Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> > > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > > > > ---
> > > > >
> > > > > Jan,
> > > > >
> > > > > As discussed, allowing sb/mount mark on anonymous pipes
> > > > > makes no sense and we should not allow it.
> > > > >
> > > > > I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> > > > > backport to maintained LTS kernels event though this dates back to day one
> > > > > with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
> > > > >
> > > > > The reason this is an RFC and that I have not included also the
> > > > > optimization patch is because we may want to consider banning kernel
> > > > > internal inodes from fanotify and/or inotify altogether.
> > > > >
> > > > > The tricky point in banning anonymous pipes from inotify, which
> > > > > could have existing users (?), but maybe not, so maybe this is
> > > > > something that we need to try out.
> > > > >
> > > > > I think we can easily get away with banning anonymous pipes from
> > > > > fanotify altogeter, but I would not like to get to into a situation
> > > > > where new applications will be written to rely on inotify for
> > > > > functionaly that fanotify is never going to have.
> > > > >
> > > > > Thoughts?
> > > > > Am I over thinking this?
> > > > >
> > > > > Amir.
> > > > >
> > > > >  fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
> > > > >  1 file changed, 14 insertions(+)
> > > > >
> > > > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > > > index 95d7d8790bc3..8240a3fdbef0 100644
> > > > > --- a/fs/notify/fanotify/fanotify_user.c
> > > > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > > > @@ -1622,6 +1622,20 @@ static int fanotify_events_supported(struct fsnotify_group *group,
> > > > >           path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
> > > > >               return -EINVAL;
> > > > >
> > > > > +     /*
> > > > > +      * mount and sb marks are not allowed on kernel internal pseudo fs,
> > > > > +      * like pipe_mnt, because that would subscribe to events on all the
> > > > > +      * anonynous pipes in the system.
> > > >
> > > > s/anonynous/anonymous/
> > > >
> > > > > +      *
> > > > > +      * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
> > > > > +      * are not exposed to user's mount namespace, but there are other
> > > > > +      * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
> > > > > +      * allowing sb and mount mark is questionable.
> > > > > +      */
> > > > > +     if (mark_type != FAN_MARK_INODE &&
> > > > > +         path->mnt->mnt_sb->s_flags & SB_NOUSER)
> > > > > +             return -EINVAL;
> > > >
> > > 
> > > On second thought, I am not sure about  the EINVAL error code here.
> > > I used the same error code that Jan used for permission events on
> > > proc fs, but the problem is that applications do not have a decent way
> > > to differentiate between
> > > "sb mark not supported by kernel" (i.e. < v4.20) vs.
> > > "sb mark not supported by fs" (the case above)
> > > 
> > > same for permission events:
> > > "kernel compiled without FANOTIFY_ACCESS_PERMISSIONS" vs.
> > > "permission events not supported by fs" (procfs)
> > > 
> > > I have looked for other syscalls that react to SB_NOUSER and I've
> > > found that mount also returns EINVAL.
> > 
> > We tend to return EINVAL both for invalid (combination of) flags as well as
> > for flags applied to invalid objects in various calls. In practice there is
> > rarely a difference.
> > 
> > > So far, fanotify_mark() and fanotify_init() mostly return EINVAL
> > > for invalid flag combinations (also across the two syscalls),
> > > but not because of the type of object being marked, except for
> > > the special case of procfs and permission events.
> > > 
> > > mount(2) syscall OTOH, has many documented EINVAL cases
> > > due to the type of source object (e.g. propagation type shared).
> > > 
> > > I know there is no standard and EINVAL can mean many
> > > different things in syscalls, but I thought that maybe EACCES
> > > would convey more accurately the message:
> > > "The sb/mount of this fs is not accessible for placing a mark".
> > > 
> > > WDYT? worth changing?
> > > worth changing procfs also?
> > > We don't have that EINVAL for procfs documented in man page btw.
> > 
> > Well, EACCES translates to message "Permission denied" which as Christian
> > writes is justifiable but frankly I find it more confusing. Because when I
> > get "Permission denied", I go looking which permissions are wrong, perhaps
> > suspecting SELinux or other LSM and don't think that object type / location
> > is at fault.
> > 
> > I agree that with EINVAL it is impossible to distinguish "unsupported on
> > this object only" vs "completely unknown flag" but it doesn't seem like a
> > huge problem for userspace to me as I can think of workarounds even if
> > userspace wants to do something else than "report error and bail".
> 
> Userspace is pretty used to the flood of EINVAL from the vfs apis so
> they often have good workarounds. It doesn't mean it's something we
> should just discount ofc. I think having ways to surface more
> descriptive errors would overall be a good thing.

Oh, I absolutely agree with that. I'm just not sure whether returning
EACCES in this particular case is going to cause more or less confusion.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR