On Mon 12-07-21 18:25:14, Darrick J. Wong wrote: > On Mon, Jul 12, 2021 at 06:55:54PM +0200, Jan Kara wrote: > > @@ -2967,6 +2992,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > pgoff_t max_off; > > struct page *page; > > vm_fault_t ret = 0; > > + bool mapping_locked = false; > > > > max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); > > if (unlikely(offset >= max_off)) > > @@ -2988,15 +3014,30 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); > > ret = VM_FAULT_MAJOR; > > fpin = do_sync_mmap_readahead(vmf); > > + } > > + > > + if (!page) { > > Is it still necessary to re-evaluate !page here? No, you are right it is not necessary. I'll remove it. > > retry_find: > > + /* > > + * See comment in filemap_create_page() why we need > > + * invalidate_lock > > + */ > > + if (!mapping_locked) { > > + filemap_invalidate_lock_shared(mapping); > > + mapping_locked = true; > > + } > > page = pagecache_get_page(mapping, offset, > > FGP_CREAT|FGP_FOR_MMAP, > > vmf->gfp_mask); > > if (!page) { > > if (fpin) > > goto out_retry; > > + filemap_invalidate_unlock_shared(mapping); > > return VM_FAULT_OOM; > > } > > + } else if (unlikely(!PageUptodate(page))) { > > + filemap_invalidate_lock_shared(mapping); > > + mapping_locked = true; > > } > > > > if (!lock_page_maybe_drop_mmap(vmf, page, &fpin)) > > @@ -3014,8 +3055,20 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > * We have a locked page in the page cache, now we need to check > > * that it's up-to-date. If not, it is going to be due to an error. > > */ > > - if (unlikely(!PageUptodate(page))) > > + if (unlikely(!PageUptodate(page))) { > > + /* > > + * The page was in cache and uptodate and now it is not. > > + * Strange but possible since we didn't hold the page lock all > > + * the time. Let's drop everything get the invalidate lock and > > + * try again. > > + */ > > + if (!mapping_locked) { > > + unlock_page(page); > > + put_page(page); > > + goto retry_find; > > + } > > goto page_not_uptodate; > > + } > > > > /* > > * We've made it this far and we had to drop our mmap_lock, now is the > > @@ -3026,6 +3079,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > unlock_page(page); > > goto out_retry; > > } > > + if (mapping_locked) > > + filemap_invalidate_unlock_shared(mapping); > > > > /* > > * Found the page and have a reference on it. > > @@ -3056,6 +3111,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > > > if (!error || error == AOP_TRUNCATED_PAGE) > > goto retry_find; > > + filemap_invalidate_unlock_shared(mapping); > > Hm. I /think/ it's the case that mapping_locked==true always holds here > because the new "The page was in cache and uptodate and now it is not." > block above will take the invalidate_lock and retry pagecache_get_page, > right? Yes. page_not_uptodate block can only be entered with mapping_locked == true - the only place that can enter this block is: if (unlikely(!PageUptodate(page))) { /* * The page was in cache and uptodate and now it is not. * Strange but possible since we didn't hold the page lock all * the time. Let's drop everything get the invalidate lock and * try again. */ if (!mapping_locked) { unlock_page(page); put_page(page); goto retry_find; } goto page_not_uptodate; } > > > > return VM_FAULT_SIGBUS; > > > > @@ -3067,6 +3123,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > */ > > if (page) > > put_page(page); > > + if (mapping_locked) > > + filemap_invalidate_unlock_shared(mapping); > > Hm. I think this looks ok, even though this patch now contains the > subtlety that we've both hoisted the xfs mmaplock to page cache /and/ > reduced the scope of the invalidate_lock. > > As for fancy things like remap_range, I think they're still safe with > this latest iteration because those functions grab the invalidate_lock > in exclusive mode and invalidate the mappings before proceeding, which > means that other programs will never find the lockless path (i.e. page > locked, uptodate, and attached to the mapping) and will instead block on > the invalidate lock until the remap operation completes. Is that > right? Correct. For operations such as hole punch or destination of remap_range, we lock invalidate_lock exclusively and invalidate pagecache in the involved range. No new pages can be created in that range until you drop invalidate_lock (places creating pages without holding i_rwsem are read, readahead, fault and all those take invalidate_lock when they should create the page). There's also the case someone pointed out that *source* of remap_range needs to be protected (but only from modifications through mmap). This is achieved by having invalidate_lock taken in .page_mkwrite handlers and thus not impacted by these changes to filemap_fault(). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR