Re: fanotify sb/mount watch inside userns (Was: [PATCH RFC] : fhandle: relax open_by_handle_at() permission checks)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 15, 2024 at 4:01 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
> On Sun, Oct 13, 2024 at 06:34:18PM +0200, Amir Goldstein wrote:
> > On Fri, May 24, 2024 at 2:35 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > >
> > > On Fri, May 24, 2024 at 1:19 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > >
> > > > A current limitation of open_by_handle_at() is that it's currently not possible
> > > > to use it from within containers at all because we require CAP_DAC_READ_SEARCH
> > > > in the initial namespace. That's unfortunate because there are scenarios where
> > > > using open_by_handle_at() from within containers.
> > > >
> > > > Two examples:
> > > >
> > > > (1) cgroupfs allows to encode cgroups to file handles and reopen them with
> > > >     open_by_handle_at().
> > > > (2) Fanotify allows placing filesystem watches they currently aren't usable in
> > > >     containers because the returned file handles cannot be used.
> > > >
> >
> > Christian,
> >
> > Follow up question:
> > Now that open_by_handle_at(2) is supported from non-root userns,
> > What about this old patch to allow sb/mount watches from non-root userns?
> > https://lore.kernel.org/linux-fsdevel/20230416060722.1912831-1-amir73il@xxxxxxxxx/
> >
> > Is it useful for any of your use cases?
> > Should I push it forward?
>
> Dammit, I answered that message already yesterday but somehow it didn't
> get sent or lost in some other way.
>
> I personally don't have a use-case for it but the systemd folks might
> and it would be best to just rope them in.

Lennart,

I must have asked this question before, but enough time has passed so
I am going to ask it again.

Now that Christian has added support for open_by_handle_at(2) by non-root
userns admin, it is a very low hanging fruit to support fanotify sb/mount
watches inside userns with this simple patch [1], that was last posted in 2011.

My question is whether this is useful, because there are still a few
limitations.
I will start with what is possible with this patch:
1. Watch an entire tmpfs filesystem that was mounted inside userns
2. Watch an entire overlayfs filesystem that was mounted [*] inside userns
3. Watch an entire mount [**] of any [***] filesystem that was
idmapped mounted into userns

Now the the fine prints:
[*] Overlayfs sb/mount CAN be watched, but decoding file handle in
events to path
     only works if overlayfs is mounted with mount option
nfs_export=on, which conflicts
     with mount option metacopy=on, which is often used in containers
(e.g. podman)
[**] Watching a mount is only possible with the legacy set of fanotify events
     (i.e. open,close,access,modify) so this is less useful for
directory tree change tracking
[***] Watching an idmapped mount has the same limitations as watching
an sb/mount
     in the root userns, namely, filesystem needs to have a non zero
fsid (so not FUSE)
     and filesystem needs to have a uniform fsid (so not btrfs
subvolume), although
     with some stretch, I could make watching an idmapped mount of
btrfs subvol work.

No support for watching btrfs subvol and overlayfs with metacopy=on,
reduces the attractiveness for containers, but perhaps there are still use cases
where watching an idmapped mount or userns private tmpfs are useful?

To try out this patch inside your favorite container/userns, you can build
fsnotifywait with a patch to support watching inside userns [2].
It's actually only the one lines O_DIRECTORY patch that is needed for the
basic tmpfs userns mount case.

Jan,

If we do not get any buy-in from potential consumers now, do you think that
we should go through with the patch and advertise the new supported use cases,
so that users may come later on?

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fanotify_userns/
[2] https://github.com/amir73il/inotify-tools/commits/fanotify_userns/





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux