On Thu, Oct 26, 2023 at 3:17 PM Jan Kara <jack@xxxxxxx> wrote: > > On Wed 25-10-23 21:02:45, Amir Goldstein wrote: > > On Wed, Oct 25, 2023 at 8:17 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > On Wed, Oct 25, 2023 at 4:50 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > Jan, > > > > > > > > This patch set implements your suggestion [1] for handling fanotify > > > > events for filesystems with non-uniform f_fsid. > > > > > > > > With these changes, events report the fsid as it would be reported > > > > by statfs(2) on the same objet, i.e. the sub-volume's fsid for an inode > > > > in sub-volume. > > > > > > > > This creates a small challenge to watching program, which needs to map > > > > from fsid in event to a stored mount_fd to use with open_by_handle_at(2). > > > > Luckily, for btrfs, fsid[0] is uniform and fsid[1] is per sub-volume. > > > > > > > > I have adapted fsnotifywatch tool [2] to be able to watch btrfs sb. > > > > The adapted tool detects the special case of btrfs (a bit hacky) and > > > > indexes the mount_fd to be used for open_by_handle_at(2) by fsid[0]. > > > > > > > > Note that this hackacry is not needed when the tool is watching a > > > > single filesystem (no need for mount_fd lookup table), because btrfs > > > > correctly decodes file handles from any sub-volume with mount_fd from > > > > any other sub-volume. > > > > > > Jan, > > > > > > Now that I've implemented the userspace part of btrfs sb watch, > > > I realize that if userspace has to be aware of the fsid oddity of btrfs > > > anyway, maybe reporting the accurate fsid of the object in event is > > > not that important at all. > > > > > > Facts: > > > 1. file_handle is unique across all sub-volumes and can be resolved > > > from any fd on any sub-volume > > > 2. fsid[0] can be compared to match an event to a btrfs sb, where any > > > fd can be used to resolve file_handle > > > 3. userspace needs to be aware of this fsid[0] fact if it watches more > > > than a single sb and userspace needs not care about the value of > > > fsid in event at all when watching a single sb > > > 4. even though fanotify never allowed setting sb mark on a path inside > > > btrfs sub-volume, it always reported events on inodes in sub-volumes > > > to btrfs sb watch - those events always carried the "wrong" fsid (i.e. > > > the btrfs root volume fsid) > > > 5. we already agreed that setting up inode marks on inodes inside > > > sub-volume should be a no brainer > > > > > > If we allow reporting either sub-vol fsid or root-vol fsid (exactly as > > > we do for inodes in sub-vol in current upstream), > > > > Another way to put it is that fsid in event describes the object > > that was used to setup the mark not the target object. > > > > If an event is received via an inode/sb/mount mark, the fsid > > would always describe the fsid of the inode that was used to setup > > the mark and that is always the fsid that userspace would query > > statfs(2) at the time of calling the fanotify_mark(2) call. > > > > Maybe it is non trivial to document, but for a library that returns > > an opaque "watch descriptor", the "watch descriptor" can always > > be deduced from the event. > > > > Does this make sense? > > Yes, it makes sense if we always reported event with fsid of the object > used for placing the mark. For filesystems with homogeneous fsid there's no > difference, for btrfs it looks like the least surprising choice and works > well for inode marks as well. The only catch is in the internal fsnotify > implementation AFAICT - if we have multiple marks for the same btrfs > superblock, each mark on different subvolume, then we should be reporting > one event with different fsids for different marks. So we need to cache the > fsid in the mark and not in the connector. But that should be easy to do. True. I thought about this as well. I have prepared the would be man page (fanotify.7) update: fsid This is a unique identifier of the filesystem containing the object associated with the event. It is a structure of type __kernel_fsid_t and contains the same value reported in f_fsid when calling statfs(2) with the same pathname argument that was used for fanotify_mark(2). Note that some filesystems (e.g., btrfs(5)) report non-uniform values of f_fsid on different objects of the same filesystem. In these cases, if fanotify_mark(2) is called several times with different pathname values, the fsid value reported in events will match f_fsid associated with at least one of those pathname values. FWIW, I removed the -EXDEV subvolume change and ran the upstream fsnotifywatch tool (without btrfs magic) on btrfs sub-volume and it correctly resolves paths for all events in the filesystem: root@kvm-xfstests:~# fsnotifywatch --filesystem /vdf/sub/vol/ & [1] 1703 root@kvm-xfstests:~# Establishing watches... Setting up filesystem watch on /vdf/sub/vol/ Finished establishing watches, now collecting statistics. root@kvm-xfstests:~# touch /vdf/x /vdf/sub/vol/y root@kvm-xfstests:~# fg fsnotifywatch --filesystem /vdf/sub/vol/ ^Ctotal attrib close_write open create filename 3 1 1 1 1 /vdf/x 3 1 1 1 1 /vdf/sub/vol/y I will prepare the patch. Thanks, Amir.