Re: [PATCH 0/3] fanotify support for btrfs sub-volumes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 26, 2023 at 3:17 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Wed 25-10-23 21:02:45, Amir Goldstein wrote:
> > On Wed, Oct 25, 2023 at 8:17 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > >
> > > On Wed, Oct 25, 2023 at 4:50 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > >
> > > > Jan,
> > > >
> > > > This patch set implements your suggestion [1] for handling fanotify
> > > > events for filesystems with non-uniform f_fsid.
> > > >
> > > > With these changes, events report the fsid as it would be reported
> > > > by statfs(2) on the same objet, i.e. the sub-volume's fsid for an inode
> > > > in sub-volume.
> > > >
> > > > This creates a small challenge to watching program, which needs to map
> > > > from fsid in event to a stored mount_fd to use with open_by_handle_at(2).
> > > > Luckily, for btrfs, fsid[0] is uniform and fsid[1] is per sub-volume.
> > > >
> > > > I have adapted fsnotifywatch tool [2] to be able to watch btrfs sb.
> > > > The adapted tool detects the special case of btrfs (a bit hacky) and
> > > > indexes the mount_fd to be used for open_by_handle_at(2) by fsid[0].
> > > >
> > > > Note that this hackacry is not needed when the tool is watching a
> > > > single filesystem (no need for mount_fd lookup table), because btrfs
> > > > correctly decodes file handles from any sub-volume with mount_fd from
> > > > any other sub-volume.
> > >
> > > Jan,
> > >
> > > Now that I've implemented the userspace part of btrfs sb watch,
> > > I realize that if userspace has to be aware of the fsid oddity of btrfs
> > > anyway, maybe reporting the accurate fsid of the object in event is
> > > not that important at all.
> > >
> > > Facts:
> > > 1. file_handle is unique across all sub-volumes and can be resolved
> > >     from any fd on any sub-volume
> > > 2. fsid[0] can be compared to match an event to a btrfs sb, where any
> > >     fd can be used to resolve file_handle
> > > 3. userspace needs to be aware of this fsid[0] fact if it watches more
> > >     than a single sb and userspace needs not care about the value of
> > >     fsid in event at all when watching a single sb
> > > 4. even though fanotify never allowed setting sb mark on a path inside
> > >     btrfs sub-volume, it always reported events on inodes in sub-volumes
> > >     to btrfs sb watch - those events always carried the "wrong" fsid (i.e.
> > >     the btrfs root volume fsid)
> > > 5. we already agreed that setting up inode marks on inodes inside
> > >     sub-volume should be a no brainer
> > >
> > > If we allow reporting either sub-vol fsid or root-vol fsid (exactly as
> > > we do for inodes in sub-vol in current upstream),
> >
> > Another way to put it is that fsid in event describes the object
> > that was used to setup the mark not the target object.
> >
> > If an event is received via an inode/sb/mount mark, the fsid
> > would always describe the fsid of the inode that was used to setup
> > the mark and that is always the fsid that userspace would query
> > statfs(2) at the time of calling the fanotify_mark(2) call.
> >
> > Maybe it is non trivial to document, but for a library that returns
> > an opaque "watch descriptor", the "watch descriptor" can always
> > be deduced from the event.
> >
> > Does this make sense?
>
> Yes, it makes sense if we always reported event with fsid of the object
> used for placing the mark. For filesystems with homogeneous fsid there's no
> difference, for btrfs it looks like the least surprising choice and works
> well for inode marks as well. The only catch is in the internal fsnotify
> implementation AFAICT - if we have multiple marks for the same btrfs
> superblock, each mark on different subvolume, then we should be reporting
> one event with different fsids for different marks. So we need to cache the
> fsid in the mark and not in the connector. But that should be easy to do.

True. I thought about this as well.

I have prepared the would be man page (fanotify.7) update:

fsid   This is a unique identifier of the filesystem containing the object
         associated with the event.  It is a structure of type
__kernel_fsid_t and
         contains the same value reported in  f_fsid  when calling
statfs(2) with
         the same pathname argument that was used for fanotify_mark(2).
         Note that some filesystems (e.g., btrfs(5)) report
non-uniform values of
         f_fsid on different objects of the same filesystem.  In these cases, if
         fanotify_mark(2) is called several times with different
pathname values,
         the fsid value reported in events will match f_fsid associated  with at
         least one of those pathname values.

FWIW, I removed the -EXDEV subvolume change and ran the upstream
fsnotifywatch tool (without btrfs magic) on btrfs sub-volume and it correctly
resolves paths for all events in the filesystem:

root@kvm-xfstests:~# fsnotifywatch --filesystem /vdf/sub/vol/ &
[1] 1703
root@kvm-xfstests:~# Establishing watches...
Setting up filesystem watch on /vdf/sub/vol/
Finished establishing watches, now collecting statistics.

root@kvm-xfstests:~# touch /vdf/x /vdf/sub/vol/y
root@kvm-xfstests:~# fg
fsnotifywatch --filesystem /vdf/sub/vol/
^Ctotal  attrib  close_write  open  create  filename
3      1       1            1     1       /vdf/x
3      1       1            1     1       /vdf/sub/vol/y

I will prepare the patch.

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux