On 2023/11/6 20:33, Christian Brauner wrote:
I would feel much more comfortable if the two filesystems that expose
these objects give us something like STATX_SUBVOLUME that userspace can
raise in the request mask of statx().
Except that this doesn't fix any existing code.
But why do we care?
Current code already does need to know it is on a btrfs subvolume.
Not really, the user space tools doesn't care if it's btrfs or not.
They just check the st_dev, and find at a point the st_dev changed, thus
they know there is a boundary.
They don't care if it's a btrfs subvolume boundary or a regular file
system boundary.
Even if they go statx, they don't really care if it's something called
subvolid or whatever, they just care how to distinguish a boundary.
Maybe it's a fsid/subvolid or whatever combination, they just want a way
to determine the boundary.
And st_dev is the perfect proxy. I don't think there is a better way to
distinguish the boundary, even if we have statx().
They
all know that btrfs subvolumes are special. They will need to know that
btrfs subvolumes are special in the future even if they were vfsmounts.
They would likely end up with another kind of confusion because suddenly
vfsmounts have device numbers that aren't associated with the superblock
that vfsmount belongs to.
This looks like you are asking user space programs (especially legacy
ones) to do special handling for btrfs, which I don't believe is the
standard way.
So nothing is really solved by vfsmounts either. The only thing that we
achieved is that we somehow accommodated that st_dev hack. And that I
consider nakable.
I think this is the problem.
If we keep the existing behavior, at least old programs won't complain
and we're still POSIX compatible, but limited number of subvolumes
(which can be more or less worked around, and is there for a while).
If we change the st_dev, firstly to what value? All the same for the
same btrfs? Then a big behavior break.
It's really a compatibility problem, and it would take a long time to
find a acceptable compromise, but never a sudden change.
You can of course complain about the vision that one fs should report
the same st_dev no matter what, but my counter argument is, for
subvolume it's really a different tree for each one, and btrfs is
combining the PV/VG/LV into one layer.
Thus either we go treat subvolumes as LVs, thus they would have
different devices numbers from each other. (just like what we do for
now, and still what I believe we should go)
Or we treat it as a VG, which should still a different device number
from all the PVs. (A made-up device id, but shared between all
subvolumes, and break up the existing behavior)
But never treating a btrfs as a PV, because that makes no sense.
If userspace requests STATX_SUBVOLUME in the request mask, the two
filesystems raise STATX_SUBVOLUME in the statx result mask and then also
return the _real_ device number of the superblock and stop exposing that
made up device number.
What is a "real" device number?
The device number of the superblock of the btrfs filesystem and not some
made-up device number.
Then again, which device for a multi-device btrfs?
The lowest devid one? Which can be gone by device rm.
The one used for mount? Which can be gone again.
A made up one? Then what's the difference? We go the VG way, and break
the existing programs, and archive nothing.
Thanks,
Qu
I care about not making a btrfs specific problem the vfs's problem by
hoisting that whole problem space a level up by mapping subvolumes to
vfsmounts.