On Wed, Nov 27, 2013 at 11:19:32PM +0100, Jan Kara wrote: > > The second is to put some kind of generation counter in the inode or > > address_space that is incremented on every deallocation. The fault path > > would read the generation counter before calling find_get_page() and then > > check it hasn't changed after getting the page lock (goto retry_find). I > > like that one a little more since we can fix it all in common code, and I > > think it'll be lower overhead in the fault path. > I'm not sure I understand how this should work. After pagecache is > truncated, fault can come and happily instantiate the page again. After > that fault is done, hole punching awakes and removes blocks from under that > page. So checking the generation counter after you get the page lock seems > useless to me. Yeah, I don't think the page lock is enough (maybe someone can convince me otherwise). How does this look? Checking the generation number after we get i_mutex ensures that the truncate has finished running. diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 54eed4f..df6278b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3648,6 +3648,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) stop_block); ext4_discard_preallocations(inode); + damage_mapping(mapping); up_write(&EXT4_I(inode)->i_data_sem); if (IS_SYNC(inode)) ext4_handle_sync(handle); diff --git a/include/linux/fs.h b/include/linux/fs.h index 1a04525..190f38c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -415,6 +415,7 @@ struct address_space { struct radix_tree_root page_tree; /* radix tree of all pages */ spinlock_t tree_lock; /* and lock protecting it */ unsigned int i_mmap_writable;/* count VM_SHARED mappings */ + unsigned i_damaged; /* damage count */ struct rb_root i_mmap; /* tree of private and shared mappings */ struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */ struct mutex i_mmap_mutex; /* protect tree, count, list */ @@ -503,6 +504,31 @@ static inline int mapping_writably_mapped(struct address_space *mapping) } /* + * A mapping is damaged when blocks are removed from the filesystem's + * data structures. + */ +static inline unsigned mapping_damage(struct address_space *mapping) +{ + unsigned seq = ACCESS_ONCE(mapping->i_damaged); + smp_rmb(); /* Subsequent reads of damagable data structures */ + return seq; +} + +/* Must be called with i_mutex held */ +static inline bool +mapping_is_damaged(struct address_space *mapping, unsigned seq) +{ + return mapping->i_damaged != seq; +} + +/* Must be called with i_mutex held */ +static inline void damage_mapping(struct address_space *mapping) +{ + smp_wmb(); /* Prior writes to damagable data structures */ + mapping->i_damaged++; +} + +/* * Use sequence counter to get consistent i_size on 32-bit processors. */ #if BITS_PER_LONG==32 && defined(CONFIG_SMP) diff --git a/mm/filemap.c b/mm/filemap.c index b7749a9..71c936d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1617,6 +1617,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) pgoff_t offset = vmf->pgoff; struct page *page; pgoff_t size; + unsigned damage = mapping_damage(mapping); int ret = 0; size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; @@ -1676,6 +1677,18 @@ retry_find: return VM_FAULT_SIGBUS; } + /* + * Check if we were the unlucky victim of a holepunch + */ + mutex_lock(&inode->i_mutex); + if (unlikely(mapping_is_damaged(mapping, damage))) { + mutex_unlock(&inode->i_mutex); + unlock_page(page); + page_cache_release(page); + damage = mapping_damage(mapping); + goto retry_find; + } + vmf->page = page; return ret | VM_FAULT_LOCKED; -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html