Re: More filesystem need this fix (xfs: use MMAPLOCK around filemap_map_pages())

Jan Kara <jack@xxxxxxx> · Thu, 17 Sep 2020 09:40:30 +0200

On Thu 17-09-20 08:37:17, Nikolay Borisov wrote:
> On 17.09.20 г. 4:44 ч., Dave Chinner wrote:
> > On Wed, Sep 16, 2020 at 05:58:51PM +0200, Jan Kara wrote:
> >> On Sat 12-09-20 09:19:11, Amir Goldstein wrote:
> >>> On Tue, Jun 23, 2020 at 8:21 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> 
> <snip>
> 
> > 
> > So....
> > 
> > P0					p1
> > 
> > hole punch starts
> >   takes XFS_MMAPLOCK_EXCL
> >   truncate_pagecache_range()
> >     unmap_mapping_range(start, end)
> >       <clears ptes>
> > 					<read fault>
> > 					do_fault_around()
> > 					  ->map_pages
> > 					    filemap_map_pages()
> > 					      page mapping valid,
> > 					      page is up to date
> > 					      maps PTEs
> > 					<fault done>
> >     truncate_inode_pages_range()
> >       truncate_cleanup_page(page)
> >         invalidates page
> >       delete_from_page_cache_batch(page)
> >         frees page
> > 					<pte now points to a freed page>
> > 
> > That doesn't seem good to me.
> > 
> > Sure, maybe the page hasn't been freed back to the free lists
> > because of elevated refcounts. But it's been released by the
> > filesystem and not longer in the page cache so nothing good can come
> > of this situation...
> > 
> > AFAICT, this race condition exists for the truncate case as well
> > as filemap_map_pages() doesn't have a "page beyond inode size" check
> > in it. 
> 
> (It's not relevant to the discussion at hand but for the sake of
> completeness):
> 
> It does have a check:
> 
>  max_idx = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE);
>  if (page->index >= max_idx)
>       goto unlock;

Yes, but this does something meaningful only for truncate. For other
operations such as hole punch this check doesn't bring anything. That's why
only filesystems supporting hole punching and similar advanced operations
need an equivalent of mmap_lock.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR