Re: Fanotify: concurrent work and handling files being executed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 6, 2024 at 1:24 AM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
>
> One of the issues we've hit recently while using fanotify in an HSM is
> racing with files that are opened for execution.
>
> There is a race that can result in ETXTBUSY.
> Pid 1: You have a file marked with FAN_OPEN_EXEC_PERM.
> Pid 2: execve(file_by_path)
> Pid 1: gets notification, with file.fd
> Pid 2: blocked, waiting for notification to resolve
> Pid 1: Does work with FD (populates the file)
> Pid 1: writes FAN_ALLOW to the fanotify file descriptor allowing the event.
> Pid 2: continues, and falls through to deny_write_access (and fails)
> Pid 1: closes fd
>
> Pid 1 can close the FD before responding, but this can result in a
> race if fanotify is being handled in a multi-threaded
> manner.
>
> I.e. if there are two threads operating on the same fanotify group,
> and an event's FD has been closed, that can be reused
> by another event. This is largely not a problem because the
> outstanding events are added in a FIFO manner to the outstanding
> event list, and as long as the earlier event is closed and responded
> to without interruption, it should be okay, but it's difficult
> to guarantee that this happens, unless event responses are serialized
> in some fashion, with strict ordering between
> responses.
>
> There are a couple of ways I see around this:
> 1. Have a flag in the fanotify response that's like FAN_CLOSE_FD,
> where fanotify_write closes the fd when
> it processes the response.

That seems doable and useful.

> 2. Make the response identifier separate from the FD. This can either
> be an IDR / xarray, or a 64-bit always
> incrementing number. The benefit of using an xarray is that responses
> can than be handled in O(1) speed
> whereas the current implementation is O(n) in relationship to the
> number of outstanding events.

The number of outstanding permission events is usually limited
by the number of events that are handled concurrently.

I think that a 64-bit event id is a worthy addition to a well designed
notifications API. I am pretty sure that both Windows and MacOS
filesystem notification events have a 64-bit event id.

If we ever implement filesystem support for persistent change
notification journals (as in Windows), those ids would be essential.

>
> This can be implemented by adding an additional piece of response
> metadata, and then that becomes the
> key vs. fd on response.
> ---
>
> An aside, ETXTBUSY / i_writecount is a real bummer. We want to be able
> to populate file content lazily,
> and I realize there are many steps between removing the write lock,
> and being able to do this, but given
> that you can achieve this with FUSE, NFS, EROFS / cachefilesd, it
> feels somewhat arbitrary to continue
> to have this in place for executable files only.

I think there are way too many security models built around this
behavior to be able to change it for the common case.

However, I will note that technically, the filesystem does not
need to require a file open for write to fill its content as long as the
file content is written directly to disk and as long as page cache
is not populated in that region and invalidate_lock is held.

Requiring FMODE_WRITE for btrfs_ioctl_encoded_write()
may make sense, but it is not a must.
A more fine grained page cache protection approach is also
an option, a bit like (or exactly like) exported pNFS layouts.

IOW, you could potentially implement lazy filling of executable
content with specific file systems that support an out of band
file filling API, but then we would also need notifications for
page faults, so let's leave that for the far future.

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux