Re: [RFC PATCH 1/7] FUSE: Add the fsnotify opcode and in/out structs to FUSE

Vivek Goyal <vgoyal@xxxxxxxxxx> · Thu, 28 Oct 2021 10:20:43 -0400

On Thu, Oct 28, 2021 at 07:13:10AM +0300, Amir Goldstein wrote:
> > > you need to either include the generation in object identifier
> > > or much better use the object's nfs file handle, the same way
> > > that fanotify stores object identifiers.
> >
> > I think nfs file handle is much more complicated and its a separate
> > project altogether. I am assuming we are talking about persistent
> > nfs file handle as generated by host. I think biggest issue we faced
> > with that is that guest is untrusted and we don't want to resolve
> > file handle provided by guest on host otherwise guest can craft
> > file handles and possibly be able to open other files on same filesystem
> > outside shared dir.
> >
> 
> Right now, virtiofsd keeps all inodes and dentries of live client inodes
> pinned in cache on the server.
> If you switch to file handles design, virtiofsd only need to keep a map
> of all the file handles that server handed out to client to address
> this concern.

Keeping a map of all file handles server handed out to client, should work.
But that means these file handles are not persistent and can't be used after
a vritiofsd restart (virtiofsd will lose its map over restart). And that
means one can not pass these file handles to user space. And that rules out
giving file handle to user space as part of fanotify API, IIUC.

So in practice it is as good as (nodeid, generation) solution. And using
file handles means giving CAP_DAC_READ_SEARCH to virtiofsd daemon. Will
really like to run virtiofsd unrpviliged (in a user namespace) and be
able to support as many features as possible.

> 
> For directories, this practice is not even needed for security, because
> a decoded directory file handle can be verified to be within the shared dir.
> It is only needed to prevent DoS, because a crafted directory file handle
> (outside of shared dir) can be used to generate extra IO and thrash the
> inode/dentry cache on the server.

Good to know that for directories only DOS is a concern.

> 
> > >
> > > > +       uint64_t mask;
> > > > +       uint32_t namelen;
> > > > +       uint32_t cookie;
> > >
> > > I object to persisting with the two-events-joined-by-cookie design.
> > > Any new design should include a single event for rename
> > > with information about src and dst.
> > >
> > > I know this is inconvenient, but we are NOT going to create a "remote inotify"
> > > interface, we need to create a "remote fsnotify" interface and if server wants
> > > to use inotify, it will need to join the disjoined MOVE_FROM/TO event into
> > > a single "remote event", that FUSE will use to call fsnotify_move().
> >
> > man inotify says following.
> >
> > "       Matching up the IN_MOVED_FROM and IN_MOVED_TO event pair  generated  by
> >        rename(2)  is thus inherently racy.  (Don't forget that if an object is
> >        renamed outside of a monitored directory, there  may  not  even  be  an
> >        IN_MOVED_TO  event.)"
> >
> > So if guest is no monitoring target dir of renamed file, then we will not
> > even get IN_MOVED_TO. In that case we can't merge two events into one.
> >
> > And this sounds like inotify/fanotify interface needs to come up with
> > an merged event and in that case remote filesystem will simply propagate
> > that event. (Instead of coming up with a new event only for remote
> > filesystems. Sounds like this is not a problem limited to remote
> > filesystems only).
> >
> 
> I don't see it that way.
> I see the "internal protocol" for filesystems/vfs to report rename is
> fsnotify_move() which carries information about both src and target.
> I would like the remote protocol  to carry the same information.
> It is then up to the userspace API whether to report the rename
> as two events or a unified event.

Sure not a bad idea and allowing "cookie" does not stop remote filesystems
from reporting single rename event.

In this case virtiofs is just a passthrough filesystem. It simply reports
back what underlying filesystem is reporting. So if inotify interface
reports two events, it will simply send that. In fact old users will
expect that.

I understand that you don't like two separate events and hence wants
to kill cookie. But how will we fix that when underlying inotify API
reports two separate events. If we try to merge these there is no
guarantee that we can do that because we might get only one events
as we might not be watching either source/target directory. We place
watches only as specified by client. 

> 
> For example, debugfs, calls fsnotify_move() when an object is renamed
> from underneath the vfs. It does not call fsnotify() twice with a cookie,
> because we would not want to change local filesystem nor remote protocols
> when we want to add new userspace APIs for reporting fsevents.
> 
> That comes down to my feeling that this claims to be a proposal
> for "remote fsnotify", but it looks and sounds like a proposal for
> "remote inotify" and this is the core of my objection to passing
> the rename cookie in the protocol.

We are starting with remote inotify support and hoping it can be extended
to remote fanotify in limited feature form. So want to make modifications
in such a way so that one can easily extend it for fanotify as well.

Now question is, that should there be separate fuse commands for inotify
and fanotify request/events. Or we should try to merge these two and
design fuse_notify_fsnotify_out{} and fuse_notify_fsnotify_in{} in such
a way so that it could support both inotify/fanotify. I guess later will
make more sense. Just that it will be more complicated as well because
fanotify API can send much more information to user space. Like, file
handles, "fd", "pid" etc.

I think "pid" might not make much sense in the context of remote filesystem.
In case of virtiofsd, it will simply be pid of another virtiofsd instance
most likely which does not mean anything for guest process. In fact,
there is a chance that it could be same pid as of guest process.

> 
> Regarding the issue that the src/dst path may not be known
> to the server, as I wrote, it is fine if either the src/dst path information
> is omitted from an event, but the protocol should provide the
> placeholder to report them.

This kind of makes more sense. That protocol should have capability to report
a rename event with both src/dst path to future proof it. That way when
inotify/fanotify APIs start reporting a single rename event, we will be
able to simply send it back to client without any fuse protocol
modifications.

One risk of adding space for src/dst path info in fuse_notify_fsnotify_out{}
is that we don't know how this information will end up looking like when
inotify/fanotify actually start supporting single rename event. It is
possible that we add some fields now and final single event format looks
different and now we have unused fields.

> 
> After sufficient arguing, I might be convinced that the cookie may be included
> as an optional field in addition to the fields that I requested.

But we should allow cookie as well so that older inotify API continues
to be supported as well.
> 
> I understand why you write that this sounds like an fanotify interface
> that needs to be resolved and you are correct, but the reason that the
> fanotify interface issue was not yet resolved is that we are trying to not
> repeat the mistakes of the past and for that same reason, I am insisting
> on the protocol.

Right now we are writing protocol fields based on what inotify (and
possibly fanotify) are supporting. But if you want to extend it further
so that it can report something which you intend to introduce in future,
I think that's fine too.

So how will additional fields will look like to support this signle
rename event.

Vivek