On Wed, Jan 20, 2021 at 05:06:08PM +0100, Jan Kara wrote: > Hello, > > Amir has reported [1] a that ext4 has a potential issues when reads can race > with hole punching possibly exposing stale data from freed blocks or even > corrupting filesystem when stale mapping data gets used for writeout. The > problem is that during hole punching, new page cache pages can get instantiated > in a punched range after truncate_inode_pages() has run but before the > filesystem removes blocks from the file. In principle any filesystem > implementing hole punching thus needs to implement a mechanism to block > instantiating page cache pages during hole punching to avoid this race. This is > further complicated by the fact that there are multiple places that can > instantiate pages in page cache. We can have regular read(2) or page fault > doing this but fadvise(2) or madvise(2) can also result in reading in page > cache pages through force_page_cache_readahead(). Doesn't this indicate that we're doing truncates in the wrong order? ie first we should deallocate the blocks, then we should free the page cache that was caching the contents of those blocks. We'd need to make sure those pages in the page cache don't get written back to disc (either by taking pages in the page cache off the lru list or having the filesystem handle writeback of pages to a freed extent as a no-op).