Re: File monitor problem

Amir Goldstein <amir73il@xxxxxxxxx> · Tue, 10 Dec 2019 22:49:49 +0200

[cc: Watchman maintainer]

> > > I could imagine fanotify events would provide FID information of the target
> > > file e.g. on create so you could then use that with open_by_handle() to
> > > open the file and get reliable access to file data (provided the file still
> > > exists). However there still remains the problem that you don't know the
> > > file name and the problem that directory changes while you are looking...
> > >
> > > So changing fanotify to suit your usecase requires more than a small tweak.
> > >
> > > For what you want, it seems e.g. btrfs send-receive functionality will
> > > provide what you need but then that's bound to a particular filesystem.
> > >
> > >                                                                 Honza
> > > --
> > > Jan Kara <jack@xxxxxxxx>
> > > SUSE Labs, CR
> >
> > I understand your concerns about reliablity. But I think functionality
> > and reliablity are two different things in this case. We`d better
> > entrust the reliability to the user.
> > Consider a user just want monitor all of filesystem changes but does
> > not intend to do anything according the received notifications.
> > I think we do not make decision for users by restricting them and
> > ignoring their necessary demands. We shuold introduce the best
> > available tools with all of concerns about them (which are
> > documented). So, we would put the user in charge of organizing his
> > projects. The user may care or not according his demands.
>
> I disgree. This is not how API design works in the Linux kernel. First, you
> have to have a good and sound use case for the functionality (and I
> understand and acknowledge your need to monitor a large directory and
> reliably synchronize changes to another place) and then we try to implement
> API that would fulfil the needs of the usecase.

For the record, although I am the author of filename patches and represent
users that use them, I myself am not fully convinced that we need to
extend the API much further. For the past few months, I have been trying
to convert our in-house filesystem monitor to work without filename in events.
I haven't yet been able to prove (for performance of all interesting workloads)
that more information in events is not needed, but haven't been able to prove
that it is not needed either. CREATE_SELF events are needed for functionality.

I have also been looking at other filesystem monitor implementations to
see if they could be converted to fanotify without any extra information
in events. I mostly focused on Watchman, which looks like the most
promising open source filesystem monitor implementation around.
It was hard for me to figure out myself if Watchman can benefit from
new fanotify API and what it is missing from the new API.

I have already implemented unprivileged fanotify (this was posted
first even before FAN_REPORT_FID), but looking for a way to demo
its usefulness - how it can avoid races compared to inotify.

One way I am considering to tackle the missing information is to
provide  unprivileged access to open_by_handle_at(2) -
Currently, this syscall requires CAP_DAC_READ_SEARCH, because
it can open files without having search access to ancestor directories.

My idea is that if process has no CAP_DAC_READ_SEARCH, then
mountfd argument will be assumed to be the direct parent of the file.
Search access will be verified on mountfd and then a restrictive
acceptable() callback will make sure that only dentry whose parent
is mountfd is decoded. Alternatively a new syscall could be used.

A special variant of exportfs_decode_fh() would be used that take
the parent as argument instead of getting parent from
s_export_op->fh_to_parent() or s_export_op->get_parent().

The end result would be that events could report parent fid and child fid.
If monitor application is watching a single directory or has a map of watched
directories (like inotifywatch does), then child could be found by handle -
as long as the file is still inside the watched directory. If there is a single
hardlink in the directory, the child name would be non ambiguous.

Child fid with FAN_DELETE/FAN_MOVE_FROM would only be useful
if monitor application keeps a map of the files in every watched directory
(I believe Watchman does anyway).

Jan, does that sound like something that would address your concerns?

Does that sound like and API that would provide an added value to users?

Am I missing anything?

Thanks,
Amir.