Btrfs and fanotify filesystem watches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Changed subject to better match what we discuss and added btrfs list to CC.

On Thu 22-11-18 17:18:25, Amir Goldstein wrote:
> On Thu, Nov 22, 2018 at 3:26 PM Jan Kara <jack@xxxxxxx> wrote:
> >
> > On Thu 22-11-18 14:36:35, Amir Goldstein wrote:
> > > > > Regardless, IIUC, btrfs_statfs() returns an fsid which is associated with
> > > > > the single super block struct, so all dentries in all subvolumes will
> > > > > return the same fsid: btrfs_sb(dentry->d_sb)->fsid.
> > > >
> > > > This is not true AFAICT. Looking at btrfs_statfs() I can see:
> > > >
> > > >         buf->f_fsid.val[0] = be32_to_cpu(fsid[0]) ^ be32_to_cpu(fsid[2]);
> > > >         buf->f_fsid.val[1] = be32_to_cpu(fsid[1]) ^ be32_to_cpu(fsid[3]);
> > > >         /* Mask in the root object ID too, to disambiguate subvols */
> > > >         buf->f_fsid.val[0] ^=
> > > >                 BTRFS_I(d_inode(dentry))->root->root_key.objectid >> 32;
> > > >         buf->f_fsid.val[1] ^=
> > > >                 BTRFS_I(d_inode(dentry))->root->root_key.objectid;
> > > >
> > > > So subvolume root is xored into the FSID. Thus dentries from different
> > > > subvolumes are going to return different fsids...
> > > >
> > >
> > > Right... how could I have missed that :-/
> > >
> > > I do not want to bring back FSNOTIFY_EVENT_DENTRY just for that
> > > and I saw how many flaws you pointed to in $SUBJECT patch.
> > > Instead, I will use:
> > > statfs_by_dentry(d_find_any_alias(inode) ?: inode->i_sb->sb_root,...
> >
> > So what about my proposal to store fsid in the notification mark when it is
> > created and then use it when that mark results in event being generated?
> > When mark is created, we have full path available, so getting proper fsid
> > is trivial. Furthermore if the behavior is documented like that, it is
> > fairly easy for userspace to track fsids it should care about and translate
> > them to proper file descriptors for open_by_handle().
> >
> > This would get rid of statfs() on every event creation (which I don't like
> > much) and also avoids these problems "how to get fsid for inode". What do
> > you think?
> >
> 
> That's interesting. I like the simplicity.
> What happens when there are 2 btrfs subvols /subvol1 /subvol2
> and the application obviously doesn't know about this and does:
> fanotify_mark(fd, FAN_MARK_ADD|FAN_MARK_FILESYSTEM, ... /subvol1);
> statfs("/subvol1",...);
> fanotify_mark(fd, FAN_MARK_ADD|FAN_MARK_FILESYSTEM, ... /subvol2);
> statfs("/subvol2",...);
> 
> Now the second fanotify_mark() call just updates the existing super block
> mark mask, but does not change the fsid on the mark, so all events will have
> fsid of subvol1 that was stored when first creating the mark.

Yes.

> fanotify-watch application (for example) hashes the watches (paths) under
> /subvol2 by fid with fsid of subvol2, so events cannot get matched back to
> "watch" (i.e. path).

I agree this can be confusing... but with btrfs fanotify-watch will be
confused even with your current code, won't it? Because FAN_MARK_FILESYSTEM
on /subvol1 (with fsid A) is going to return also events on inodes from
/subvol2 (with fsid B). So your current code will return event with fsid B
which fanotify-watch has no way to match back and can get confused.

So currently application can get events with fsid it has never seen, with
the code as I suggest it can get "wrong" fsid. That is confusing but still
somewhat better?

The core of the problem is that btrfs does not have "the superblock
identifier" that would correspond to FAN_MARK_FILESYSTEM scope of events
that we could use.

> And when trying to open_by_handle fid with fhandle from /subvol2
> using mount_fd of /subvol1, I suppose we can either get ESTALE
> or a disconnected dentry, because object from /subvol2 cannot
> have a path inside /subvol1....

So open_by_handle() should work fine even if we get mount_fd of /subvol1
and handle for inode on /subvol2. mount_fd in open_by_handle() is really
only used to get the superblock and that is the same.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux