Re: [PATCH 0/3] fanotify support for btrfs sub-volumes

Christian Brauner <brauner@xxxxxxxxxx> · Tue, 31 Oct 2023 13:14:42 +0100

> > A per-subvolume vfsmount means that /proc/mounts /proc/$PID/mountinfo becomes

So that part confuses me and I'd like to understand this a bit more.

So everytime you create a subvolume what you're doing today is that you
give it an anonymous device number stored in ->anon_dev which presumably
is also stored on disk?

Say I have a btrfs filesystem with 2 subvolumes on /dev/sda:

/mnt/subvol1
/mnt/subvol2

What happens in the kernel right now I've mentiond in the mount api
conversion patch for btrfs I sent out in June at [1] because I tweaked
that behavior. Say I mount both subvolumes:

mount /dev/sda -o subvol=subvol1 /vol1 # sb1@vfsmount1
mount /dev/sda -o subvol=subvol2 /vol2 # sb1@vfsmount2

It creates a superblock for /dev/sda. It then creates two vfsmounts: one
for subvol1 and one for subvol2. So you end up with two subvolumes on
the same superblock.

So if you mount a subvolume today then you already get separate
vfsmounts. To put it another way. If you start 10,000 containers each
using a separate btrfs subvolume then you get 10,000 vfsmounts.

So I don't yet understand the scaling argument if each subvolume has a
vfsmount anyway because afaict that's already the case.

Or is it that you want a separate superblock per subvolume? Because only
if you allocate a new superblock you'll get clean device number
handling, no? Or am I misunderstanding this?

mount /dev/sda -o subvol=subvol1 /vol1 # sget_fc() -> sb1@vfsmount1
mount /dev/sda -o subvol=subvol2 /vol2 # sget_fc() -> sb2@vfsmount2

and mounting the same subvolume again somewhere else gives you the same
superblock but on a different vfsmount:

mount /dev/sda -o subvol=subvol1 /vol1 # sget_fc() -> sb1@vfsmount3

Is that the proposal?

[1]: https://lore.kernel.org/all/20230626-fs-btrfs-mount-api-v1-2-045e9735a00b@xxxxxxxxxx