On Thu, Sep 17, 2020 at 12:00:06PM -0700, Linus Torvalds wrote: > On Thu, Sep 17, 2020 at 11:50 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > Ahh. Here's a race this doesn't close: > > > > int truncate_inode_page(struct address_space *mapping, struct page *page) > > I think this one currently depends on the page lock, doesn't it? > > And I think the point would be to get rid of that dependency, and just > make the rule be that it's done with the i_mmap_rwsem held for > writing. Ah, I see what you mean. Hold the i_mmap_rwsem for write across, basically, the entirety of truncate_inode_pages_range(). I don't see a problem with lock scope; according to rmap.c, i_mmap_rwsem is near the top of the hierarchy, just under lock_page. We do wait for I/O to complete (both reads and writes), but I don't know a reason for that to be a problem. We might want to take the page lock anyway to prevent truncate() from racing with a read() that decides to start new I/O to this page, which would involve adjusting the locking hierarchy (although to a way in which hugetlb and the regular VM are back in sync). My brain is starting to hurt from thinking about ways that not taking the page lock in truncate might go wrong.