On Wed, Nov 27, 2013 at 07:33:43PM -0700, Matthew Wilcox wrote: > On Wed, Nov 27, 2013 at 11:19:32PM +0100, Jan Kara wrote: > > > The second is to put some kind of generation counter in the inode or > > > address_space that is incremented on every deallocation. The fault path > > > would read the generation counter before calling find_get_page() and then > > > check it hasn't changed after getting the page lock (goto retry_find). I > > > like that one a little more since we can fix it all in common code, and I > > > think it'll be lower overhead in the fault path. > > I'm not sure I understand how this should work. After pagecache is > > truncated, fault can come and happily instantiate the page again. After > > that fault is done, hole punching awakes and removes blocks from under that > > page. So checking the generation counter after you get the page lock seems > > useless to me. > > Yeah, I don't think the page lock is enough (maybe someone can convince > me otherwise). How does this look? Checking the generation number > after we get i_mutex ensures that the truncate has finished running. But there's no guarantee that whoever is removing the pages from the page cache is holding the i_mutex (XFS hole punching and most page cache operations serialise on XFS_IOLOCK_EXCL, not i_mutex), so you can't use i_mutex in this way for generic code. Besides... > @@ -1617,6 +1617,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > pgoff_t offset = vmf->pgoff; > struct page *page; > pgoff_t size; > + unsigned damage = mapping_damage(mapping); > int ret = 0; > > size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; > @@ -1676,6 +1677,18 @@ retry_find: > return VM_FAULT_SIGBUS; > } > > + /* > + * Check if we were the unlucky victim of a holepunch > + */ > + mutex_lock(&inode->i_mutex); > + if (unlikely(mapping_is_damaged(mapping, damage))) { > + mutex_unlock(&inode->i_mutex); > + unlock_page(page); > + page_cache_release(page); > + damage = mapping_damage(mapping); > + goto retry_find; > + } ... aren't we holding the mmap_sem here? i.e. taking the i_mutex here will lead to deadlocks.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html