On Wed, Nov 01, 2023 at 07:11:53PM +1030, Qu Wenruo wrote: > > > On 2023/11/1 18:46, Christian Brauner wrote: > > On Tue, Oct 31, 2023 at 10:06:17AM -0700, Christoph Hellwig wrote: > > > On Tue, Oct 31, 2023 at 01:50:46PM +0100, Christian Brauner wrote: > > > > So this is effectively a request for: > > > > > > > > btrfs subvolume create /mnt/subvol1 > > > > > > > > to create vfsmounts? IOW, > > > > > > > > mkfs.btrfs /dev/sda > > > > mount /dev/sda /mnt > > > > btrfs subvolume create /mnt/subvol1 > > > > btrfs subvolume create /mnt/subvol2 > > > > > > > > would create two new vfsmounts that are exposed in /proc/<pid>/mountinfo > > > > afterwards? > > > > > > Yes. > > > > > > > That might be odd. Because these vfsmounts aren't really mounted, no? > > > > > > Why aren't they? > > > > > > > And so you'd be showing potentially hundreds of mounts in > > > > /proc/<pid>/mountinfo that you can't unmount? > > > > > > Why would you not allow them to be unmounted? > > > > > > > And even if you treat them as mounted what would unmounting mean? > > > > > > The code in btrfs_lookup_dentry that does a hand crafted version > > > of the file system / subvolume crossing (the location.type != > > > BTRFS_INODE_ITEM_KEY one) would not be executed. > > > > So today, when we do: > > > > mkfs.btrfs -f /dev/sda > > mount -t btrfs /dev/sda /mnt > > btrfs subvolume create /mnt/subvol1 > > btrfs subvolume create /mnt/subvol2 > > > > Then all subvolumes are always visible under /mnt. > > IOW, you can't hide them other than by overmounting or destroying them. > > > > If we make subvolumes vfsmounts then we very likely alter this behavior > > and I see two obvious options: > > > > (1) They are fake vfsmounts that can't be unmounted: > > > > umount /mnt/subvol1 # returns -EINVAL > > > > This retains the invariant that every subvolume is always visible > > from the filesystems root, i.e., /mnt will include /mnt/subvol{1,} > > I'd like to go this option. But I still have a question. > > How do we properly unmount a btrfs? > Do we have some other way to record which subvolume is really mounted > and which is just those place holder? So the downside of this really is that this would be custom btrfs semantics. Having mounts in /proc/<pid>/mountinfo that you can't unmount only happens in weird corner cases today: * mounts inherited during unprivileged mount namespace creation * locked mounts Both of which are pretty inelegant and effectively only exist because of user namespaces. So if we can avoid proliferating such semantics it would be preferable. I think it would also be rather confusing for userspace to be presented with a bunch of mounts in /proc/<pid>/mountinfo that it can't do anything with. > > (2) They are proper vfsmounts: > > > > umount /mnt/subvol1 # succeeds > > > > This retains standard semantics for userspace about anything that > > shows up in /proc/<pid>/mountinfo but means that after > > umount /mnt/subvol1 succeeds, /mnt/subvol1 won't be accessible from > > the filesystem root /mnt anymore. > > > > Both options can be made to work from a purely technical perspective, > > I'm asking which one it has to be because it isn't clear just from the > > snippets in this thread. > > > > One should also point out that if each subvolume is a vfsmount, then say > > a btrfs filesystems with 1000 subvolumes which is mounted from the root: > > > > mount -t btrfs /dev/sda /mnt > > > > could be exploded into 1000 individual mounts. Which many users might not want. > > Can we make it dynamic? AKA, the btrfs_insert_fs_root() is the perfect > timing here. Probably, it would be an automount. Though I would have to recheck that code to see how exactly that would work but roughly, when you add the inode for the subvolume you raise S_AUTOMOUNT on it and then you add .d_automount for btrfs.