On Thu, Sep 17, 2020 at 11:44:54AM +1000, Dave Chinner wrote: > So.... > > P0 p1 > > hole punch starts > takes XFS_MMAPLOCK_EXCL > truncate_pagecache_range() ... locks page ... > unmap_mapping_range(start, end) > <clears ptes> > <read fault> > do_fault_around() > ->map_pages > filemap_map_pages() ... trylock page fails ... > page mapping valid, > page is up to date > maps PTEs > <fault done> > truncate_inode_pages_range() > truncate_cleanup_page(page) > invalidates page > delete_from_page_cache_batch(page) > frees page > <pte now points to a freed page> > > That doesn't seem good to me. > > Sure, maybe the page hasn't been freed back to the free lists > because of elevated refcounts. But it's been released by the > filesystem and not longer in the page cache so nothing good can come > of this situation... > > AFAICT, this race condition exists for the truncate case as well > as filemap_map_pages() doesn't have a "page beyond inode size" check > in it. However, exposure here is very limited in the truncate case > because truncate_setsize()->truncate_pagecache() zaps the PTEs > again after invalidating the page cache. > > Either way, adding the XFS_MMAPLOCK_SHARED around > filemap_map_pages() avoids the race condition for fallocate and > truncate operations for XFS... > > > As such it is a rather > > different beast compared to the fault handler from fs POV and does not need > > protection from hole punching (current serialization on page lock and > > checking of page->mapping is enough). > > That being said I agree this is subtle and the moment someone adds e.g. a > > readahead call into filemap_map_pages() we have a real problem. I'm not > > sure how to prevent this risk... > > Subtle, yes. So subtle, in fact, I fail to see any reason why the > above race cannot occur as there's no obvious serialisation in the > page cache between PTE zapping and page invalidation to prevent a > fault from coming in an re-establishing the PTEs before invalidation > occurs. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >