Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault

Josef Bacik <josef@xxxxxxxxxxxxxx> · Fri, 2 Aug 2024 12:03:57 -0400

On Thu, Aug 01, 2024 at 11:40:25PM +0200, Jan Kara wrote:
> On Thu 25-07-24 14:19:47, Josef Bacik wrote:
> > FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on page fault depending
> > on the faulting method.
> > 
> > This pre-content event is meant to be used by hierarchical storage
> > managers that want to fill in the file content on first read access.
> > 
> > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> ...
> > @@ -3287,6 +3288,35 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
> >  	if (unlikely(index >= max_idx))
> >  		return VM_FAULT_SIGBUS;
> >  
> > +	/*
> > +	 * If we have pre-content watchers then we need to generate events on
> > +	 * page fault so that we can populate any data before the fault.
> > +	 *
> > +	 * We only do this on the first pass through, otherwise the populating
> > +	 * application could potentially deadlock on the mmap lock if it tries
> > +	 * to populate it with mmap.
> > +	 */
> > +	if (fault_flag_allow_retry_first(vmf->flags) &&
> > +	    fsnotify_file_has_content_watches(file)) {
> 
> I'm somewhat nervous that if ALLOW_RETRY isn't set, we'd silently jump into
> readpage code without ever sending pre-content event and thus we'd possibly
> expose invalid content to userspace? I think we should fail the fault if
> fsnotify_file_has_content_watches(file) && !(vmf->flags &
> FAULT_FLAG_ALLOW_RETRY).

I was worried about this too but it seems to not be the case that we'll not ever
have ALLOW_RETRY.  That being said I'm fine turning this into a sigbus.

> 
> > +		int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_READ;
> > +		loff_t pos = vmf->pgoff << PAGE_SHIFT;
> > +
> > +		fpin = maybe_unlock_mmap_for_io(vmf, fpin);
> > +
> > +		/*
> > +		 * We can only emit the event if we did actually release the
> > +		 * mmap lock.
> > +		 */
> > +		if (fpin) {
> > +			error = fsnotify_file_area_perm(fpin, mask, &pos,
> > +							PAGE_SIZE);
> > +			if (error) {
> > +				fput(fpin);
> > +				return VM_FAULT_ERROR;
> > +			}
> > +		}
> > +	}
> > +
> >  	/*
> >  	 * Do we have something in the page cache already?
> >  	 */
> ...
> > @@ -3612,6 +3643,13 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
> >  	unsigned long rss = 0;
> >  	unsigned int nr_pages = 0, mmap_miss = 0, mmap_miss_saved, folio_type;
> >  
> > +	/*
> > +	 * We are under RCU, we can't emit events here, we need to force a
> > +	 * normal fault to make sure the events get sent.
> > +	 */
> > +	if (fsnotify_file_has_content_watches(file))
> > +		return ret;
> > +
> 
> I don't think we need to do anything for filemap_map_pages(). The call just
> inserts page cache content into page tables and whatever is in the page
> cache and has folio_uptodate() set should be already valid file content,
> shouldn't it?

I'll make this comment more clear.  filemap_fault() will start readahead, but
we'll only emit the event for the page size that we're faulting.  I had looked
at putting this at the readahead place and figuring out the readahead size, but
literally anything could trigger readahead so it's better to just not allow
filemap_map_pages() to happen, otherwise we'll end up with empty pages (if the
content hasn't been populated yet) and never emit an event for those ranges.
Thanks,

Josef