Re: [PATCH 0/3] fanotify support for btrfs sub-volumes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 06, 2023 at 04:29:23AM -0800, Christoph Hellwig wrote:
> On Mon, Nov 06, 2023 at 11:03:37AM +0100, Christian Brauner wrote:
> > But why do we care?
> > Current code already does need to know it is on a btrfs subvolume. They
> > all know that btrfs subvolumes are special.
> 
> "they all know" is a bit vague.  How do you know "all" code knows?

Granted, an over-generalization but non in any way different from
claiming that currently on one needs to know about btrfs subvolumes or
that the proposed vfsmount solution will make it magically so that no
one needs to care anymore.

Tools will have to change either way is my point. And a lot of tools do
already handle subvolumes specially exactly because of the non-unique
inode situation. And if they don't they still can get confused by seing
st_dev numbers they can't associate with a filesystem.

> > They will need to know that
> > btrfs subvolumes are special in the future even if they were vfsmounts.
> > They would likely end up with another kind of confusion because suddenly
> > vfsmounts have device numbers that aren't associated with the superblock
> > that vfsmount belongs to.
> 
> Let's take a step back.  Posix says st_ino is uniqueue for a given
> st_dev, and per posix a mount mount is defined as any file that
> has a different st_dev from the parent.  So by the Posix definition
> btrfs subvolume roots are mount points, which is am obvios clash
> with the Linux definition based on vfsmounts.

3.229 Mount Point
Either the system root directory or a directory for which the st_dev
field of structure stat differs from that of its parent directory.

I think that's just an argument against mapping subvolumes to vfsmounts.
Because bind-mounts don't change the device number - and they very much
shouldn't.

> 
> > > > If userspace requests STATX_SUBVOLUME in the request mask, the two
> > > > filesystems raise STATX_SUBVOLUME in the statx result mask and then also
> > > > return the _real_ device number of the superblock and stop exposing that
> > > > made up device number.
> > > 
> > > What is a "real" device number?
> > 
> > The device number of the superblock of the btrfs filesystem and not some
> > made-up device number.
> 
> The block device st_dev is just as made up.
> 
> > I care about not making a btrfs specific problem the vfs's problem by
> > hoisting that whole problem space a level up by mapping subvolumes to
> > vfsmounts.
> 
> While I'd love to fix it, and evern more not have more of this
> crap sneak in (*cough* bcachefs, *cough*). І'm ok with that stance.
> But that also means we can't let this creep into the vfs by other
> means, which is what started the thread.

The thing is I'm not even sure there's anything to fix.

This discussion started with btrfs maybe getting an alternative way to
uniquify an inode independent of st_dev.

I'm not sure that is such a massive problem.

If we give both btrfs and bcachefs a single flag in statx() that allows
_interested_ userspace to query whether a file is located on a subvolume
that shouldn't be a problem (We have STATX_ATTR_* which identifies
additional properties that are restricted to few filesytems).

And all the specific gobbledigook can be implemented as an ioctl() -
ideally both btrfs and bcachefs agree on something - that the vfs
doesn't have to care about at all.

I genuinely don't care if they report a fake st_dev from stat(). I
genuinely _do_ care that we don't make vfsmounts privy to this.

Let alone that automounts are a giant paint. Not just do they iirc allow
to create shadow mounts, they also interact with namespace and container
creation.

If you spawn thousands of containers each with a private mount namespace
- which is the default - you now trigger automounts in thousands of
containers when triggering a lookup on btrfs. If you have mount
propagation turned on each automount may also propagate into god knows
how many other mount namespaces. That's just nasty.

IOW, making subvolumes vfsmounts will also have wider semantic
implications for using btrfs as a filesystem.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux