I'm trying to figure out what protects us in this scenario: Thread A mmaps a file and reads from it. Thread B punches holes in that file. It seems to me that thread A can end up with references to pages that are in the page cache that point to deallocated blocks (which the filesystem might subsequently allocate to a different file). Thread B calls ext4_punch_hole() Thread B takes ->i_mutex Thread B calls truncate_pagecache_range() Thread B calls unmap_mapping_range() which removes the PTE from Thread A's page tables. Thread B calls truncate_inode_pages_range() which removes the pages from the page cache. Thread A calls filemap_fault(). It calls find_get_page(), but it's not there. It calls do_sync_mmap_readahead() then find_get_page() again, still not there. So it calls page_cache_read() which calls ext4_readpage which calls mpage_readpage(), which calls ext4_get_block() which finds the blocks in question, nowhere taking the i_mutex. Now Thread B gets back from its summer holiday, takes i_data_sem and calls ext4_es_remove_extent() and ext4_ext_remove_space() / ext4_free_hole_blocks() before releasing i_data_sem and i_mutex. Thread A happily dirties the page it has. Thread C extends a file, which gets the victim block allocated. vmscan causes the page to be written back, since it's dirty and on the LRU lists Bang, thread C has a corrupted file. This doesn't happen with the normal truncate path since i_size is written before calling unmap_mapping_range, and i_size is checked after filemap_fault gets the page lock. How should we solve this? I see two realistic options. One is an rwsem in the inode or address_space taken for read by the fault path and for write by the holepunch path. It's kind of icky because each filesystem will need to add support for it. The second is to put some kind of generation counter in the inode or address_space that is incremented on every deallocation. The fault path would read the generation counter before calling find_get_page() and then check it hasn't changed after getting the page lock (goto retry_find). I like that one a little more since we can fix it all in common code, and I think it'll be lower overhead in the fault path. -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html