Re: [PATCH v2 1/5] fsnotify: pass dentry instead of inode when available

Jan Kara <jack@xxxxxxx> · Thu, 22 Nov 2018 10:52:19 +0100

On Wed 21-11-18 18:13:17, Amir Goldstein wrote:
> >
> > > 2. exportfs_encode_fh(dentry,... in fanotify_alloc_fid()
> > >     Can be converted to exportfs_encode_inode_fh(inode,...
> >
> > If you have the parent inode then yes. In lot of cases we do have it, not
> > sure if in all of them (but likely yes so that we can do proper
> > FS_EVENT_ON_CHILD) handling.
> >
> 
> We do not need the parent inode, because we are not encoding a
> "connectable" file handle. we need:
> exportfs_encode_inode_fh(inode, (struct fid *)fid->f_handle,
>                                   &dwords,  NULL);

Ah, you're right. I thought any handle has to be "connectable" as you say
but apparently open-by-handle is fine even with not connectable ones.
Thanks for enlightening me :)

> > > 3. statfs_by_dentry(dentry,... in fanotify_alloc_fid()
> > >     Can be converted to statfs_by_dentry(inode->i_sb->sb_root,...
> >
> > This might be problematic e.g. with btrfs subvolumes which have each
> > different fsid so the fsid-handle pair might be actually invalid or even
> > point to a file with different contents. Maybe we could just store the
> > fsid in the fsnotify_mark when it is created and use it when generating
> > events? That would also get rid of the overhead of calling statfs for each
> > generated event which I don't like...
> >
> 
> OK, I was not being accurate when I wrote that these are the only 3 places
> we use dentry. Those are the only 3 places in fanotify, but we also use
> dentry->d_sb in fsnotify() to get to the sb mark of course, so we will be using
> inode->i_sb instead.

Sure.

> w.r.t btrfs - btrfs has a single sb for multiple subvolumes so by
> definition the FAN_MARK_FILESYSTEM feature is only capable of watching
> ALL subvolumes.

Agreed but then if some event happens on inode A in subvolume S, you have
to be sure the handle+fsid you return will allow you to open the file with
that content and not inode A on subvolume T which has a different
contents...

> If we wanted to implement subvolume watch for btrfs, we would need
> to support attaching a mark to a btrfs subvolume struct (or fs_view [1]).

No, that's not what I meant.

> Basically, the purpose, of fid->fsid is:
> 1. A key for finding a mount point to use as mount_fd argument to
> open_by_handle_at()
> 2. Make fid unique across the system

Understood, that's what I thought.

> Since btrfs file handles encode the subvolume root object id in the file
> handle, fid->fsid is only needed to find a btrfs mount of the same sb
> (blockdev).

Ah good. I didn't realize btrfs will record subvolume root in the file
handle. On a second thought how else could NFS mount work, right, but it
didn't occur to me yesterday.

> Regardless, IIUC, btrfs_statfs() returns an fsid which is associated with
> the single super block struct, so all dentries in all subvolumes will
> return the same fsid: btrfs_sb(dentry->d_sb)->fsid.

This is not true AFAICT. Looking at btrfs_statfs() I can see:

        buf->f_fsid.val[0] = be32_to_cpu(fsid[0]) ^ be32_to_cpu(fsid[2]);
        buf->f_fsid.val[1] = be32_to_cpu(fsid[1]) ^ be32_to_cpu(fsid[3]);
        /* Mask in the root object ID too, to disambiguate subvols */
        buf->f_fsid.val[0] ^=
                BTRFS_I(d_inode(dentry))->root->root_key.objectid >> 32;
        buf->f_fsid.val[1] ^=
                BTRFS_I(d_inode(dentry))->root->root_key.objectid;

So subvolume root is xored into the FSID. Thus dentries from different
subvolumes are going to return different fsids...

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR