On Thu, Aug 01, 2024 at 11:40:25PM +0200, Jan Kara wrote: > On Thu 25-07-24 14:19:47, Josef Bacik wrote: > > FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on page fault depending > > on the faulting method. > > > > This pre-content event is meant to be used by hierarchical storage > > managers that want to fill in the file content on first read access. > > > > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > ... > > @@ -3287,6 +3288,35 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > if (unlikely(index >= max_idx)) > > return VM_FAULT_SIGBUS; > > > > + /* > > + * If we have pre-content watchers then we need to generate events on > > + * page fault so that we can populate any data before the fault. > > + * > > + * We only do this on the first pass through, otherwise the populating > > + * application could potentially deadlock on the mmap lock if it tries > > + * to populate it with mmap. > > + */ > > + if (fault_flag_allow_retry_first(vmf->flags) && > > + fsnotify_file_has_content_watches(file)) { > > I'm somewhat nervous that if ALLOW_RETRY isn't set, we'd silently jump into > readpage code without ever sending pre-content event and thus we'd possibly > expose invalid content to userspace? I think we should fail the fault if > fsnotify_file_has_content_watches(file) && !(vmf->flags & > FAULT_FLAG_ALLOW_RETRY). I was worried about this too but it seems to not be the case that we'll not ever have ALLOW_RETRY. That being said I'm fine turning this into a sigbus. > > > + int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_READ; > > + loff_t pos = vmf->pgoff << PAGE_SHIFT; > > + > > + fpin = maybe_unlock_mmap_for_io(vmf, fpin); > > + > > + /* > > + * We can only emit the event if we did actually release the > > + * mmap lock. > > + */ > > + if (fpin) { > > + error = fsnotify_file_area_perm(fpin, mask, &pos, > > + PAGE_SIZE); > > + if (error) { > > + fput(fpin); > > + return VM_FAULT_ERROR; > > + } > > + } > > + } > > + > > /* > > * Do we have something in the page cache already? > > */ > ... > > @@ -3612,6 +3643,13 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, > > unsigned long rss = 0; > > unsigned int nr_pages = 0, mmap_miss = 0, mmap_miss_saved, folio_type; > > > > + /* > > + * We are under RCU, we can't emit events here, we need to force a > > + * normal fault to make sure the events get sent. > > + */ > > + if (fsnotify_file_has_content_watches(file)) > > + return ret; > > + > > I don't think we need to do anything for filemap_map_pages(). The call just > inserts page cache content into page tables and whatever is in the page > cache and has folio_uptodate() set should be already valid file content, > shouldn't it? I'll make this comment more clear. filemap_fault() will start readahead, but we'll only emit the event for the page size that we're faulting. I had looked at putting this at the readahead place and figuring out the readahead size, but literally anything could trigger readahead so it's better to just not allow filemap_map_pages() to happen, otherwise we'll end up with empty pages (if the content hasn't been populated yet) and never emit an event for those ranges. Thanks, Josef