On Tue, Mar 11, 2025 at 1:34 PM Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> wrote: > > On Tue, Mar 11, 2025 at 12:41:52PM +0100, Amir Goldstein wrote: > > Pre-content hooks in page faults introduces potential deadlock of HSM > > handler in userspace with filesystem freezing. > > > > The requirement with pre-content event is that for every accessed file > > range an event covering at least this range will be generated at least > > once before the file data is accesses. > > > > In preparation to disabling pre-content event hooks on page faults, > > change those hooks to always use the mask MAY_ACCESS and add pre-content > > hooks at mmap() variants for the entire mmaped range, so HSM can fill > > content when user requests to map a portion of the file. > > > > Note that exec() variant also calls vm_mmap_pgoff() internally to map > > code sections, so pre-content hooks are also generated in this case. > > > > Link: https://lore.kernel.org/linux-fsdevel/7ehxrhbvehlrjwvrduoxsao5k3x4aw275patsb3krkwuq573yv@o2hskrfawbnc/ > > Suggested-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > --- > > mm/filemap.c | 3 +-- > > mm/mmap.c | 12 ++++++++++++ > > mm/util.c | 7 +++++++ > > 3 files changed, 20 insertions(+), 2 deletions(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 2974691fdfad2..f85d288209b44 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -3350,7 +3350,6 @@ static vm_fault_t filemap_fault_recheck_pte_none(struct vm_fault *vmf) > > vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf) > > { > > struct file *fpin = NULL; > > - int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_ACCESS; > > loff_t pos = vmf->pgoff >> PAGE_SHIFT; > > size_t count = PAGE_SIZE; > > int err; > > @@ -3370,7 +3369,7 @@ vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf) > > if (!fpin) > > return VM_FAULT_SIGBUS; > > > > - err = fsnotify_file_area_perm(fpin, mask, &pos, count); > > + err = fsnotify_file_area_perm(fpin, MAY_ACCESS, &pos, count); > > fput(fpin); > > if (err) > > return VM_FAULT_SIGBUS; > > diff --git a/mm/mmap.c b/mm/mmap.c > > index cda01071c7b1f..70318936fd588 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -48,6 +48,7 @@ > > #include <linux/sched/mm.h> > > #include <linux/ksm.h> > > #include <linux/memfd.h> > > +#include <linux/fsnotify.h> > > > > #include <linux/uaccess.h> > > #include <asm/cacheflush.h> > > @@ -1151,6 +1152,17 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > > I kind of hate that we keep on extending this deprecate syscall. Is it > truly necessary here? > If my understanding of remap_file_pages() is correct then no new regions of the file are being mapped - so no, my bad - the pre-content hook is not needed in this case. > > return ret; > > } > > > > + if (file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) { > > Is there a circumstance where file == NULL here? I mean get_file() > literally dereferences a field then returns the pointer, so that'd be a > null pointer deref? > > Also I'm pretty sure it's impossible possible for a VMA to be VM_SHARED and > !vma->vm_file, since we need to access the address_space etc. > > > + int mask = (prot & PROT_WRITE) ? MAY_WRITE : MAY_READ; > > + loff_t pos = pgoff >> PAGE_SHIFT; > > + > > + ret = fsnotify_file_area_perm(file, mask, &pos, size); > > All other invocations of this in fs code, this further amplifies my belief > that this belongs in fs code. > > > + if (ret) { > > + fput(file); > > + return ret; > > + } > > + } > > + > > ret = -EINVAL; > > > > /* OK security check passed, take write lock + let it rip. */ > > diff --git a/mm/util.c b/mm/util.c > > index b6b9684a14388..2dddeabac6098 100644 > > --- a/mm/util.c > > +++ b/mm/util.c > > @@ -23,6 +23,7 @@ > > #include <linux/processor.h> > > #include <linux/sizes.h> > > #include <linux/compat.h> > > +#include <linux/fsnotify.h> > > > > #include <linux/uaccess.h> > > > > @@ -569,6 +570,12 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, > > LIST_HEAD(uf); > > > > ret = security_mmap_file(file, prot, flag); > > + if (!ret && file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) { > > + int mask = (prot & PROT_WRITE) ? MAY_WRITE : MAY_READ; > > + loff_t pos = pgoff >> PAGE_SHIFT; > > + > > + ret = fsnotify_file_area_perm(file, mask, &pos, len); > > + } > > if (!ret) { > > if (mmap_write_lock_killable(mm)) > > return -EINTR; > > You've duplicated this code in 2 places, can we please just have it in one > as a helper function? > > Also I'm not a fan of having super-specific file system code relating to > HSM in general mapping code like this. Can't we have something like a hook > or something more generic? > > I mean are we going to keep on expanding this for other super-specific > cases? > > I would say refactor this whole thing into a check that's done in fs code > that we can call into. > > If we need a new hook, then let's add one. If we can use existing hooks, > let's use them. > fsnotify hooks and logic have traditionally been contained in fsnotify.h wrappers much like the security_ hooks. For the page fault hooks, Linus asked for the FMODE_FSNOTIFY_HSM condition to be very visible and explicit, but I suppose there is no reason for not having a fsnotify_mmap() wrapper here. > Also, is it valid to be accessing this file without doing a get_file() > here? Caller of vm_mmap_pgoff() should have a reference on file. > It seems super inconsistent you increment ref count in one place but > not the other? By other place do you mean in remap_file_pages()? There, the file reference is coming from vma->vm_file and hook is called outside mmap lock. Thanks, Amir.