Re: [PATCH 0/3] fanotify support for btrfs sub-volumes

Qu Wenruo <quwenruo.btrfs@xxxxxxx> · Wed, 1 Nov 2023 19:11:53 +1030

On 2023/11/1 18:46, Christian Brauner wrote:
On Tue, Oct 31, 2023 at 10:06:17AM -0700, Christoph Hellwig wrote:
On Tue, Oct 31, 2023 at 01:50:46PM +0100, Christian Brauner wrote:
So this is effectively a request for:

btrfs subvolume create /mnt/subvol1

to create vfsmounts? IOW,

mkfs.btrfs /dev/sda
mount /dev/sda /mnt
btrfs subvolume create /mnt/subvol1
btrfs subvolume create /mnt/subvol2

would create two new vfsmounts that are exposed in /proc/<pid>/mountinfo
afterwards?

Yes.

That might be odd. Because these vfsmounts aren't really mounted, no?

Why aren't they?

And so you'd be showing potentially hundreds of mounts in
/proc/<pid>/mountinfo that you can't unmount?

Why would you not allow them to be unmounted?

And even if you treat them as mounted what would unmounting mean?

The code in btrfs_lookup_dentry that does a hand crafted version
of the file system / subvolume crossing (the location.type !=
BTRFS_INODE_ITEM_KEY one) would not be executed.

So today, when we do:

mkfs.btrfs -f /dev/sda
mount -t btrfs /dev/sda /mnt
btrfs subvolume create /mnt/subvol1
btrfs subvolume create /mnt/subvol2

Then all subvolumes are always visible under /mnt.
IOW, you can't hide them other than by overmounting or destroying them.

If we make subvolumes vfsmounts then we very likely alter this behavior
and I see two obvious options:

(1) They are fake vfsmounts that can't be unmounted:

     umount /mnt/subvol1 # returns -EINVAL

     This retains the invariant that every subvolume is always visible
     from the filesystems root, i.e., /mnt will include /mnt/subvol{1,}

I'd like to go this option. But I still have a question.

How do we properly unmount a btrfs?
Do we have some other way to record which subvolume is really mounted
and which is just those place holder?

(2) They are proper vfsmounts:

     umount /mnt/subvol1 # succeeds

     This retains standard semantics for userspace about anything that
     shows up in /proc/<pid>/mountinfo but means that after
     umount /mnt/subvol1 succeeds, /mnt/subvol1 won't be accessible from
     the filesystem root /mnt anymore.

Both options can be made to work from a purely technical perspective,
I'm asking which one it has to be because it isn't clear just from the
snippets in this thread.

One should also point out that if each subvolume is a vfsmount, then say
a btrfs filesystems with 1000 subvolumes which is mounted from the root:

mount -t btrfs /dev/sda /mnt

could be exploded into 1000 individual mounts. Which many users might not want.

Can we make it dynamic? AKA, the btrfs_insert_fs_root() is the perfect
timing here.

That would greatly reduce the initial vfsmount explode, but I'm not sure
if it's possible to add vfsmount halfway.

Thanks,
Qu

So I would expect that we would need to default to mounting without
subvolumes accessible, and a mount option to mount with all subvolumes
mounted, idk:

mount -t btrfs -o tree /dev/sda /mnt

or sm.

I agree that mapping subvolumes to vfsmounts sounds like the natural
thing to do.

But if we do e.g., (2) then this surely needs to be a Kconfig and/or a
mount option to avoid breaking userspace (And I'm pretty sure that btrfs
will end up supporting both modes almost indefinitely.).