On Thu, Sep 17, 2020 at 11:30:00AM -0700, Linus Torvalds wrote: > On Thu, Sep 17, 2020 at 11:23 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > Something like taking > > the i_mmap_lock_read(file->f_mapping) in filemap_fault, then adding a > > new VM_FAULT_I_MMAP_LOCKED bit so that do_read_fault() and friends add: > > > > if (ret & VM_FAULT_I_MMAP_LOCKED) > > i_mmap_unlock_read(vmf->vma->vm_file->f_mapping); > > else > > unlock_page(page); > > > > ... want me to turn that into a real patch? > > I can't guarantee it's the right model - it does worry me how many > places we might get that i_mmap_rwlock, and how long we migth hold it > for writing, and what deadlocks it might cause when we take it for > reading in the page fault path. > > But I think it might be very interesting as a benchmark patch and a > trial balloon. Maybe it "just works". Ahh. Here's a race this doesn't close: int truncate_inode_page(struct address_space *mapping, struct page *page) { VM_BUG_ON_PAGE(PageTail(page), page); if (page->mapping != mapping) return -EIO; truncate_cleanup_page(mapping, page); delete_from_page_cache(page); return 0; } truncate_cleanup_page() does if (page_mapped(page)) { pgoff_t nr = PageTransHuge(page) ? HPAGE_PMD_NR : 1; unmap_mapping_pages(mapping, page->index, nr, false); } but ->mapping isn't cleared until delete_from_page_cache() many instructions later. So we can get the lock and have a page which appears to be not-truncated, only for it to get truncated on us later. > I would _love_ for the page lock itself to be only (or at least > _mainly_) about the actual IO synchronization on the page. > > That was the origin of it, the whole "protect all the complex state of > a page" behavior kind of grew over time, since it was the only > per-page lock we had. Yes, I think that's a noble goal.