On Thu, Oct 26, 2023 at 2:02 AM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: > > > > On 2023/10/26 07:36, Josef Bacik wrote: > > On Wed, Oct 25, 2023 at 08:34:21AM -0700, Christoph Hellwig wrote: > >> On Wed, Oct 25, 2023 at 04:50:45PM +0300, Amir Goldstein wrote: > >>> Jan, > >>> > >>> This patch set implements your suggestion [1] for handling fanotify > >>> events for filesystems with non-uniform f_fsid. > >> > >> File systems nust never report non-uniform fsids (or st_dev) for that > >> matter. btrfs is simply broken here and needs to be fixed. > > > > We keep going around and around on this so I'd like to get a set of steps laid > > out for us to work towards to resolve this once and for all. > > > > HYSTERICAL RAISINS (why we do st_dev) > > ------------------------------------- > > > > Chris made this decision forever ago because things like rsync would screw up > > with snapshots and end up backing up the same thing over and over again. We saw > > it was using st_dev (as were a few other standard tools) to distinguish between > > file systems, so we abused this to make userspace happy. > > > > The other nice thing this provided was a solution for the fact that we re-use > > inode numbers in the file system, as they're unique for the subvolume only. > > > > PROBLEMS WE WANT TO SOLVE > > ------------------------- > > > > 1) Stop abusing st_dev. We actually want this as btrfs developers because it's > > kind of annoying to figure out which device is mounted when st_dev doesn't > > map to any of the devices in /proc/mounts. > > > > 2) Give user space a way to tell it's on a subvolume, so it can not be confused > > by the repeating inode numbers. > > > > POSSIBLE SOLUTIONS > > ------------------ > > > > 1) A statx field for subvolume id. The subvolume id's are unique to the file > > system, so subvolume id + inode number is unique to the file system. This is > > a u64, so is nice and easy to export through statx. > > 2) A statx field for the uuid/fsid of the file system. I'd like this because > > again, being able to easily stat a couple of files and tell they're on the > > same file system is a valuable thing. We have a per-fs uuid that we can > > export here. > > 3) A statx field for the uuid of the subvolume. Our subvolumes have their own > > unique uuid. This could be an alternative for the subvolume id option, or an > > addition. > > No need for a full UUID, just a u64 is good enough. > > Although a full UUID for the subvolumes won't hurt and can reduce the > need to call the btrfs specific ioctl just to receive the UUID. > > > My concern is, such new members would not be utilized by any other fs, > would it cause some compatibility problem? > > > > > Either 1 or 3 are necessary to give userspace a way to tell they've wandered > > into a different subvolume. I'd like to have all 3, but I recognize that may be > > wishful thinking. 2 isn't necessary, but if we're going to go about messing > > with statx then I'd like to do it all at once, and I want this for the reasons > > stated above. > > > > SEQUENCE OF EVENTS > > ------------------ > > > > We do one of the statx changes, that rolls into a real kernel. We run around > > and submit patches for rsync and anything else we can think of to take advantage > > of the statx feature. > > My main concern is, how older programs could handle this? Like programs > utilizing stat() only, and for whatever reasons they don't bother to add > statx() support. > (Can vary from lack of maintenance to weird compatibility reasons) > > Thus we still need such st_dev hack, until there is no real world > programs utilizing vanilla stat() only. > (Which everyone knows it's impossible) > I agree it does not sound possible to change the world to know that the same st_dev,st_ino pair could belong to different objects. One such program btw is diff - it will skip the comparison if both objects have the same st_dev,st_ino even if they are actually different objects with different data (i.e. a file and its old snapshot). Thanks, Amir.