Re: [PATCH 1/2] fsnotify: add pre-content hooks on mmap()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 11, 2025 at 12:41:52PM +0100, Amir Goldstein wrote:
> Pre-content hooks in page faults introduces potential deadlock of HSM
> handler in userspace with filesystem freezing.
>
> The requirement with pre-content event is that for every accessed file
> range an event covering at least this range will be generated at least
> once before the file data is accesses.
>
> In preparation to disabling pre-content event hooks on page faults,
> change those hooks to always use the mask MAY_ACCESS and add pre-content
> hooks at mmap() variants for the entire mmaped range, so HSM can fill
> content when user requests to map a portion of the file.
>
> Note that exec() variant also calls vm_mmap_pgoff() internally to map
> code sections, so pre-content hooks are also generated in this case.
>
> Link: https://lore.kernel.org/linux-fsdevel/7ehxrhbvehlrjwvrduoxsao5k3x4aw275patsb3krkwuq573yv@o2hskrfawbnc/
> Suggested-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> ---
>  mm/filemap.c |  3 +--
>  mm/mmap.c    | 12 ++++++++++++
>  mm/util.c    |  7 +++++++
>  3 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 2974691fdfad2..f85d288209b44 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3350,7 +3350,6 @@ static vm_fault_t filemap_fault_recheck_pte_none(struct vm_fault *vmf)
>  vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf)
>  {
>  	struct file *fpin = NULL;
> -	int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_ACCESS;
>  	loff_t pos = vmf->pgoff >> PAGE_SHIFT;
>  	size_t count = PAGE_SIZE;
>  	int err;
> @@ -3370,7 +3369,7 @@ vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf)
>  	if (!fpin)
>  		return VM_FAULT_SIGBUS;
>
> -	err = fsnotify_file_area_perm(fpin, mask, &pos, count);
> +	err = fsnotify_file_area_perm(fpin, MAY_ACCESS, &pos, count);
>  	fput(fpin);
>  	if (err)
>  		return VM_FAULT_SIGBUS;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index cda01071c7b1f..70318936fd588 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -48,6 +48,7 @@
>  #include <linux/sched/mm.h>
>  #include <linux/ksm.h>
>  #include <linux/memfd.h>
> +#include <linux/fsnotify.h>
>
>  #include <linux/uaccess.h>
>  #include <asm/cacheflush.h>
> @@ -1151,6 +1152,17 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,

I kind of hate that we keep on extending this deprecate syscall. Is it
truly necessary here?

>  		return ret;
>  	}
>
> +	if (file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) {

Is there a circumstance where file == NULL here? I mean get_file()
literally dereferences a field then returns the pointer, so that'd be a
null pointer deref?

Also I'm pretty sure it's impossible possible for a VMA to be VM_SHARED and
!vma->vm_file, since we need to access the address_space etc.

> +		int mask = (prot & PROT_WRITE) ? MAY_WRITE : MAY_READ;
> +		loff_t pos = pgoff >> PAGE_SHIFT;
> +
> +		ret = fsnotify_file_area_perm(file, mask, &pos, size);

All other invocations of this in fs code, this further amplifies my belief
that this belongs in fs code.

> +		if (ret) {
> +			fput(file);
> +			return ret;
> +		}
> +	}
> +
>  	ret = -EINVAL;
>
>  	/* OK security check passed, take write lock + let it rip. */
> diff --git a/mm/util.c b/mm/util.c
> index b6b9684a14388..2dddeabac6098 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -23,6 +23,7 @@
>  #include <linux/processor.h>
>  #include <linux/sizes.h>
>  #include <linux/compat.h>
> +#include <linux/fsnotify.h>
>
>  #include <linux/uaccess.h>
>
> @@ -569,6 +570,12 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
>  	LIST_HEAD(uf);
>
>  	ret = security_mmap_file(file, prot, flag);
> +	if (!ret && file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) {
> +		int mask = (prot & PROT_WRITE) ? MAY_WRITE : MAY_READ;
> +		loff_t pos = pgoff >> PAGE_SHIFT;
> +
> +		ret = fsnotify_file_area_perm(file, mask, &pos, len);
> +	}
>  	if (!ret) {
>  		if (mmap_write_lock_killable(mm))
>  			return -EINTR;

You've duplicated this code in 2 places, can we please just have it in one
as a helper function?

Also I'm not a fan of having super-specific file system code relating to
HSM in general mapping code like this. Can't we have something like a hook
or something more generic?

I mean are we going to keep on expanding this for other super-specific
cases?

I would say refactor this whole thing into a check that's done in fs code
that we can call into.

If we need a new hook, then let's add one. If we can use existing hooks,
let's use them.

Also, is it valid to be accessing this file without doing a get_file()
here? It seems super inconsistent you increment ref count in one place but
not the other?

> --
> 2.34.1
>




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux