Re: [PATCH v2 0/2] unprivileged fanotify listener

Christian Brauner <christian.brauner@xxxxxxxxxx> · Wed, 24 Mar 2021 17:28:38 +0100

On Wed, Mar 24, 2021 at 05:05:45PM +0200, Amir Goldstein wrote:
> On Wed, Mar 24, 2021 at 4:32 PM Christian Brauner
> <christian.brauner@xxxxxxxxxx> wrote:
> >
> > On Wed, Mar 24, 2021 at 03:57:12PM +0200, Amir Goldstein wrote:
> > > > > Now tested FAN_MARK_FILESYSTEM watch on tmpfs mounted
> > > > > inside userns and works fine, with two wrinkles I needed to iron:
> > > > >
> > > > > 1. FAN_REPORT_FID not supported on tmpfs because tmpfs has
> > > > >     zero f_fsid (easy to fix)
> > > > > 2. open_by_handle_at() is not userns aware (can relax for
> > > > >     FS_USERNS_MOUNT fs)
> > > > >
> > > > > Pushed these two fixes to branch fanotify_userns.
> > > >
> > > > Pushed another fix to mnt refcount bug in WIP and another commit to
> > > > add the last piece that could make fanotify usable for systemd-homed
> > > > setup - a filesystem watch filtered by mnt_userns (not tested yet).
> > > >
> > >
> > > Now I used mount-idmapped (from xfstest) to test that last piece.
> > > Found a minor bug and pushed a fix.
> > >
> > > It is working as expected, that is filtering only the events generated via
> > > the idmapped mount. However, because the listener I tested is capable in
> > > the mapped userns and not in the sb userns, the listener cannot
> > > open_ny_handle_at(), so the result is not as useful as one might hope.
> >
> > This is another dumb question probably but in general, are you saying
> > that someone watching a mount or directory and does _not_ want file
> > descriptors from fanotify to be returned has no other way of getting to
> > the path they want to open other than by using open_by_handle_at()?
> >
> 
> Well there is another way.
> It is demonstrated in my demo with intoifywatch --fanotify --recursive.
> It involved userspace iterating a subtree of interest to create fid->path
> map.

Ok, so this seems to be

inotifytools_filename_from_fid()
-> if (fanotify_mark_type != FAN_MARK_FILESYSTEM)
           watch_from_fid()
   -> read_path_from(/proc/self/fd/w->dirfd)

> 
> The fanotify recursive watch is similar but not exactly the same as the
> old intoify recursive watch, because with inotify recursive watch you
> can miss events.
> 
> With fanotify recursive watch, the listener (if capable) can setup a
> filesystem mark so events will not be missed. They will be recorded
> by fid with an unknown path and the path information can be found later
> by the crawler and updated in the map before the final report.
> 
> Events on fid that were not found by the crawler need not be reported.
> That's essentially a subtree watch for the poor implemented in userspace.

This is already a good improvement.
Honestly, having FAN_MARK_INODE workable unprivileged is already pretty
great. In addition having FAN_MARK_MOUNT workable with idmapped mounts
will likely get us what most users care about, afaict that is the POC
in:
https://github.com/amir73il/linux/commit/f0d5d462c5baeb82a658944c6df80704434f09a1

(I'm reading the source correctly that FAN_MARK_MOUNT works with
FAN_REPORT_FID as long as no inode event set in FANOTIFY_INODE_EVENTS is
set? I'm asking because my manpage - probably too old - seems to imply
that FAN_REPORT_FID can't be used with FAN_MARK_MOUNT although I might
just be stumbling over the phrasing.)

I think FAN_MARK_FILESYSTEM should simply stay under the s_userns_s
capable requirement. That's imho the cleanest semantics for this, i.e.
I'd drop:
https://github.com/amir73il/linux/commit/bd20e273f3c3a650805b3da32e493f01cc2a4763
This is neither an urgent use-case nor am I feeling very comfortable
with it.

> 
> I did not implement the combination --fanotify --global --recursive in my
> demo, because it did not make sense with the current permission model
> (listener that can setup a fs mark can always resolve fids to path), but it
> would be quite trivial to add.
> 
> 
> > >
> > > I guess we will also need to make open_by_handle_at() idmapped aware
> > > and use a variant of vfs_dentry_acceptable() that validates that the opened
> > > path is legitimately accessible via the idmapped mount.
> >
> > So as a first step, I think there's a legitimate case to be made for
> > open_by_handle_at() to be made useable inside user namespaces. That's a
> > change worth to be made independent of fanotify. For example, nowadays
> > cgroups have a 64 bit identifier that can be used with open_by_handle_at
> > to map a cgrp id to a path and back:
> > https://lkml.org/lkml/2020/12/2/1126
> > Right now this can't be used in user namespaces because of this
> > restriction but it is genuinely useful to have this feature available
> > since cgroups are FS_USERNS_MOUNT and that identifier <-> path mapping
> > is very convenient.
> 
> FS_USERNS_MOUNT is a simple case and I think it is safe.
> There is already a patch for that on my fanotify_userns branch.

Great!
Christian