On 2023/10/26 07:36, Josef Bacik wrote:
On Wed, Oct 25, 2023 at 08:34:21AM -0700, Christoph Hellwig wrote:
On Wed, Oct 25, 2023 at 04:50:45PM +0300, Amir Goldstein wrote:
Jan,
This patch set implements your suggestion [1] for handling fanotify
events for filesystems with non-uniform f_fsid.
File systems nust never report non-uniform fsids (or st_dev) for that
matter. btrfs is simply broken here and needs to be fixed.
We keep going around and around on this so I'd like to get a set of steps laid
out for us to work towards to resolve this once and for all.
HYSTERICAL RAISINS (why we do st_dev)
-------------------------------------
Chris made this decision forever ago because things like rsync would screw up
with snapshots and end up backing up the same thing over and over again. We saw
it was using st_dev (as were a few other standard tools) to distinguish between
file systems, so we abused this to make userspace happy.
The other nice thing this provided was a solution for the fact that we re-use
inode numbers in the file system, as they're unique for the subvolume only.
PROBLEMS WE WANT TO SOLVE
-------------------------
1) Stop abusing st_dev. We actually want this as btrfs developers because it's
kind of annoying to figure out which device is mounted when st_dev doesn't
map to any of the devices in /proc/mounts.
2) Give user space a way to tell it's on a subvolume, so it can not be confused
by the repeating inode numbers.
POSSIBLE SOLUTIONS
------------------
1) A statx field for subvolume id. The subvolume id's are unique to the file
system, so subvolume id + inode number is unique to the file system. This is
a u64, so is nice and easy to export through statx.
2) A statx field for the uuid/fsid of the file system. I'd like this because
again, being able to easily stat a couple of files and tell they're on the
same file system is a valuable thing. We have a per-fs uuid that we can
export here.
3) A statx field for the uuid of the subvolume. Our subvolumes have their own
unique uuid. This could be an alternative for the subvolume id option, or an
addition.
No need for a full UUID, just a u64 is good enough.
Although a full UUID for the subvolumes won't hurt and can reduce the
need to call the btrfs specific ioctl just to receive the UUID.
My concern is, such new members would not be utilized by any other fs,
would it cause some compatibility problem?
Either 1 or 3 are necessary to give userspace a way to tell they've wandered
into a different subvolume. I'd like to have all 3, but I recognize that may be
wishful thinking. 2 isn't necessary, but if we're going to go about messing
with statx then I'd like to do it all at once, and I want this for the reasons
stated above.
SEQUENCE OF EVENTS
------------------
We do one of the statx changes, that rolls into a real kernel. We run around
and submit patches for rsync and anything else we can think of to take advantage
of the statx feature.
My main concern is, how older programs could handle this? Like programs
utilizing stat() only, and for whatever reasons they don't bother to add
statx() support.
(Can vary from lack of maintenance to weird compatibility reasons)
Thus we still need such st_dev hack, until there is no real world
programs utilizing vanilla stat() only.
(Which everyone knows it's impossible)
Thanks,
Qu
Then we wait, call it 2 kernel releases after the initial release. Then we go
and rip out the dev_t hack. >
Does this sound like a reasonable path forward to resolve everybody's concerns?
I feel like I'm missing some other argument here, but I'm currently on vacation
and can't think of what it is nor have the energy to go look it up at the
moment. Thanks,
Josef