Re: [PATCH 1/2] fsnotify: add pre-content hooks on mmap()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 11, 2025 at 1:34 PM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
>
> On Tue, Mar 11, 2025 at 12:41:52PM +0100, Amir Goldstein wrote:
> > Pre-content hooks in page faults introduces potential deadlock of HSM
> > handler in userspace with filesystem freezing.
> >
> > The requirement with pre-content event is that for every accessed file
> > range an event covering at least this range will be generated at least
> > once before the file data is accesses.
> >
> > In preparation to disabling pre-content event hooks on page faults,
> > change those hooks to always use the mask MAY_ACCESS and add pre-content
> > hooks at mmap() variants for the entire mmaped range, so HSM can fill
> > content when user requests to map a portion of the file.
> >
> > Note that exec() variant also calls vm_mmap_pgoff() internally to map
> > code sections, so pre-content hooks are also generated in this case.
> >
> > Link: https://lore.kernel.org/linux-fsdevel/7ehxrhbvehlrjwvrduoxsao5k3x4aw275patsb3krkwuq573yv@o2hskrfawbnc/
> > Suggested-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > ---
> >  mm/filemap.c |  3 +--
> >  mm/mmap.c    | 12 ++++++++++++
> >  mm/util.c    |  7 +++++++
> >  3 files changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 2974691fdfad2..f85d288209b44 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -3350,7 +3350,6 @@ static vm_fault_t filemap_fault_recheck_pte_none(struct vm_fault *vmf)
> >  vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf)
> >  {
> >       struct file *fpin = NULL;
> > -     int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_ACCESS;
> >       loff_t pos = vmf->pgoff >> PAGE_SHIFT;
> >       size_t count = PAGE_SIZE;
> >       int err;
> > @@ -3370,7 +3369,7 @@ vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf)
> >       if (!fpin)
> >               return VM_FAULT_SIGBUS;
> >
> > -     err = fsnotify_file_area_perm(fpin, mask, &pos, count);
> > +     err = fsnotify_file_area_perm(fpin, MAY_ACCESS, &pos, count);
> >       fput(fpin);
> >       if (err)
> >               return VM_FAULT_SIGBUS;
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index cda01071c7b1f..70318936fd588 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -48,6 +48,7 @@
> >  #include <linux/sched/mm.h>
> >  #include <linux/ksm.h>
> >  #include <linux/memfd.h>
> > +#include <linux/fsnotify.h>
> >
> >  #include <linux/uaccess.h>
> >  #include <asm/cacheflush.h>
> > @@ -1151,6 +1152,17 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>
> I kind of hate that we keep on extending this deprecate syscall. Is it
> truly necessary here?
>

If my understanding of remap_file_pages() is correct then no new regions of the
file are being mapped - so no, my bad - the pre-content hook is not
needed in this case.


> >               return ret;
> >       }
> >
> > +     if (file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) {
>
> Is there a circumstance where file == NULL here? I mean get_file()
> literally dereferences a field then returns the pointer, so that'd be a
> null pointer deref?
>
> Also I'm pretty sure it's impossible possible for a VMA to be VM_SHARED and
> !vma->vm_file, since we need to access the address_space etc.
>
> > +             int mask = (prot & PROT_WRITE) ? MAY_WRITE : MAY_READ;
> > +             loff_t pos = pgoff >> PAGE_SHIFT;
> > +
> > +             ret = fsnotify_file_area_perm(file, mask, &pos, size);
>
> All other invocations of this in fs code, this further amplifies my belief
> that this belongs in fs code.
>
> > +             if (ret) {
> > +                     fput(file);
> > +                     return ret;
> > +             }
> > +     }
> > +
> >       ret = -EINVAL;
> >
> >       /* OK security check passed, take write lock + let it rip. */
> > diff --git a/mm/util.c b/mm/util.c
> > index b6b9684a14388..2dddeabac6098 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -23,6 +23,7 @@
> >  #include <linux/processor.h>
> >  #include <linux/sizes.h>
> >  #include <linux/compat.h>
> > +#include <linux/fsnotify.h>
> >
> >  #include <linux/uaccess.h>
> >
> > @@ -569,6 +570,12 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
> >       LIST_HEAD(uf);
> >
> >       ret = security_mmap_file(file, prot, flag);
> > +     if (!ret && file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) {
> > +             int mask = (prot & PROT_WRITE) ? MAY_WRITE : MAY_READ;
> > +             loff_t pos = pgoff >> PAGE_SHIFT;
> > +
> > +             ret = fsnotify_file_area_perm(file, mask, &pos, len);
> > +     }
> >       if (!ret) {
> >               if (mmap_write_lock_killable(mm))
> >                       return -EINTR;
>
> You've duplicated this code in 2 places, can we please just have it in one
> as a helper function?
>
> Also I'm not a fan of having super-specific file system code relating to
> HSM in general mapping code like this. Can't we have something like a hook
> or something more generic?
>
> I mean are we going to keep on expanding this for other super-specific
> cases?
>
> I would say refactor this whole thing into a check that's done in fs code
> that we can call into.
>
> If we need a new hook, then let's add one. If we can use existing hooks,
> let's use them.
>

fsnotify hooks and logic have traditionally been contained in fsnotify.h
wrappers much like the security_ hooks.

For the page fault hooks, Linus asked for the FMODE_FSNOTIFY_HSM
condition to be very visible and explicit, but I suppose there is no reason
for not having a fsnotify_mmap() wrapper here.

> Also, is it valid to be accessing this file without doing a get_file()
> here?

Caller of vm_mmap_pgoff() should have a reference on file.

> It seems super inconsistent you increment ref count in one place but
> not the other?

By other place do you mean in remap_file_pages()?
There, the file reference is coming from vma->vm_file and hook is called
outside mmap lock.

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux