Re: [PATCH 0/3] fanotify support for btrfs sub-volumes

Christian Brauner <brauner@xxxxxxxxxx> · Wed, 1 Nov 2023 09:16:50 +0100

On Tue, Oct 31, 2023 at 10:06:17AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 31, 2023 at 01:50:46PM +0100, Christian Brauner wrote:
> > So this is effectively a request for:
> > 
> > btrfs subvolume create /mnt/subvol1
> > 
> > to create vfsmounts? IOW,
> > 
> > mkfs.btrfs /dev/sda
> > mount /dev/sda /mnt
> > btrfs subvolume create /mnt/subvol1
> > btrfs subvolume create /mnt/subvol2
> > 
> > would create two new vfsmounts that are exposed in /proc/<pid>/mountinfo
> > afterwards?
> 
> Yes.
> 
> > That might be odd. Because these vfsmounts aren't really mounted, no?
> 
> Why aren't they?
> 
> > And so you'd be showing potentially hundreds of mounts in
> > /proc/<pid>/mountinfo that you can't unmount?
> 
> Why would you not allow them to be unmounted?
> 
> > And even if you treat them as mounted what would unmounting mean?
> 
> The code in btrfs_lookup_dentry that does a hand crafted version
> of the file system / subvolume crossing (the location.type !=
> BTRFS_INODE_ITEM_KEY one) would not be executed.

So today, when we do:

mkfs.btrfs -f /dev/sda
mount -t btrfs /dev/sda /mnt
btrfs subvolume create /mnt/subvol1
btrfs subvolume create /mnt/subvol2

Then all subvolumes are always visible under /mnt.
IOW, you can't hide them other than by overmounting or destroying them.

If we make subvolumes vfsmounts then we very likely alter this behavior
and I see two obvious options:

(1) They are fake vfsmounts that can't be unmounted:

    umount /mnt/subvol1 # returns -EINVAL

    This retains the invariant that every subvolume is always visible
    from the filesystems root, i.e., /mnt will include /mnt/subvol{1,}

(2) They are proper vfsmounts:

    umount /mnt/subvol1 # succeeds

    This retains standard semantics for userspace about anything that
    shows up in /proc/<pid>/mountinfo but means that after
    umount /mnt/subvol1 succeeds, /mnt/subvol1 won't be accessible from
    the filesystem root /mnt anymore.

Both options can be made to work from a purely technical perspective,
I'm asking which one it has to be because it isn't clear just from the
snippets in this thread.

One should also point out that if each subvolume is a vfsmount, then say
a btrfs filesystems with 1000 subvolumes which is mounted from the root:

mount -t btrfs /dev/sda /mnt

could be exploded into 1000 individual mounts. Which many users might not want.

So I would expect that we would need to default to mounting without
subvolumes accessible, and a mount option to mount with all subvolumes
mounted, idk:

mount -t btrfs -o tree /dev/sda /mnt

or sm.

I agree that mapping subvolumes to vfsmounts sounds like the natural
thing to do.

But if we do e.g., (2) then this surely needs to be a Kconfig and/or a
mount option to avoid breaking userspace (And I'm pretty sure that btrfs
will end up supporting both modes almost indefinitely.).