Re: Fanotify: concurrent work and handling files being executed

Amir Goldstein <amir73il@xxxxxxxxx> · Wed, 7 Feb 2024 13:15:55 +0200

On Wed, Feb 7, 2024 at 12:44 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Tue 06-02-24 18:44:47, Amir Goldstein wrote:
> > On Tue, Feb 6, 2024 at 6:30 PM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
> > >
> > > On Tue, Feb 6, 2024 at 6:50 AM Jan Kara <jack@xxxxxxx> wrote:
> > > >
> > > > On Tue 06-02-24 09:44:29, Amir Goldstein wrote:
> > > > > On Tue, Feb 6, 2024 at 1:24 AM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
> > > > > >
> > > > > > One of the issues we've hit recently while using fanotify in an HSM is
> > > > > > racing with files that are opened for execution.
> > > > > >
> > > > > > There is a race that can result in ETXTBUSY.
> > > > > > Pid 1: You have a file marked with FAN_OPEN_EXEC_PERM.
> > > > > > Pid 2: execve(file_by_path)
> > > > > > Pid 1: gets notification, with file.fd
> > > > > > Pid 2: blocked, waiting for notification to resolve
> > > > > > Pid 1: Does work with FD (populates the file)
> > > > > > Pid 1: writes FAN_ALLOW to the fanotify file descriptor allowing the event.
> > > > > > Pid 2: continues, and falls through to deny_write_access (and fails)
> > > > > > Pid 1: closes fd
> > > >
> > > > Right, this is kind of nasty.
> > > >
> > > > > > Pid 1 can close the FD before responding, but this can result in a
> > > > > > race if fanotify is being handled in a multi-threaded
> > > > > > manner.
> > > >
> > > > Yep.
> > > >
> > > > > > I.e. if there are two threads operating on the same fanotify group,
> > > > > > and an event's FD has been closed, that can be reused
> > > > > > by another event. This is largely not a problem because the
> > > > > > outstanding events are added in a FIFO manner to the outstanding
> > > > > > event list, and as long as the earlier event is closed and responded
> > > > > > to without interruption, it should be okay, but it's difficult
> > > > > > to guarantee that this happens, unless event responses are serialized
> > > > > > in some fashion, with strict ordering between
> > > > > > responses.
> > > >
> > > > Yes, essentially you must make sure you will not read any new events from
> > > > the notification queue between fd close & writing of the response. Frankly,
> > > > I find this as quite ugly and asking for trouble (subtle bugs down the
> > > > line).
> > > >
> > > Is there a preference for either refactoring fanotify_event_metadata, or
> > > adding this new ID type as a piece of metadata?
> > >
> > > I almost feel like the FD should move to being metadata, and we should
> > > use ID in place of fd in fanotify_event_metadata. If we use an xarray,
> > > it should be reasonable to use a 32-bit identifier, so we don't need
> > > to modify the fanotify_event_metadata structure at all.
> >
> > I have a strong preference for FANOTIFY_METADATA_VERSION 4
> > because I really would like event->key to be 64bit and in the header,
> > but I have a feeling that Jan may have a different opinion..
>
> I also think 64-bit ID would be potentially more useful for userspace
> (mostly because of guaranteed uniqueness). I'm just still not yet sure how
> do you plan to use it for persistent events because there are several
> options how to generate the ID. I'd hate to add yet-another-id in the near
> future.

The use case of permission events and persistent async events do not
overlap and for the future, they should not be mixed in the same group
at all. mixing permission events and async events in the same group
was probably a mistake and we should not repeat it.

The event->id namespace is per group.
permission events are always realtime events so they do not persist
and event->id can be a runtime id used for responding.
A future group for persistent async events would be something like
https://learn.microsoft.com/en-us/windows/win32/fileio/change-journal-records
and then the 64-bit event->id would have a different meaning.
It means "ACK that events up ID are consumed" or "query the events since ID".

>
> Regarding FANOTIFY_METADATA_VERSION 4: What are your reasons to want the ID
> in the header?

Just a matter of personal taste - I see event->id as being fundamental
information
and not "extra" information.

Looking forward at persistent events, it would be easier to iterate and skip
up to ID, if event->id is in the header.

> I think we'd need explicit init flag to enable event ID
> reporting anyway?But admittedly I can see some appeal of having ID in the
> header if we are going to use the ID for matching responses to permission
> events.

Yes, as you wrote,
permission events with FAN_REPORT_FID are a clean start.
I don't mind if this setup requires an explicit FAN_REPORT_EVENT_ID flag.

Also, regarding your suggestion to report FID instead of event->fd,
I think we can do it like that:
1. With FAN_REPORT_FID in permission events, event->fd should
    NOT be used to access the file
2. Instead, it should be used as a mount_fd arg to open_by_handle_at()
    along with the reported fid
3. In that case, event->fd may be an O_PATH fd (we need a
    patch to allow O_PATH fd as mount_fd)
4. An fd that is open with open_by_handle_at(event->fd, ...
    will have the FMODE_NONOTIFY flag, so it is safe to access the file

This model also solves my problem with rename(), because a single
mount_fd could be used to open both old and new parent path fds.

Does that sound like a plan?

Thanks,
Amir.