Hello, Amir has reported [1] a that ext4 has a potential issues when reads can race with hole punching possibly exposing stale data from freed blocks or even corrupting filesystem when stale mapping data gets used for writeout. The problem is that during hole punching, new page cache pages can get instantiated in a punched range after truncate_inode_pages() has run but before the filesystem removes blocks from the file. In principle any filesystem implementing hole punching thus needs to implement a mechanism to block instantiating page cache pages during hole punching to avoid this race. This is further complicated by the fact that there are multiple places that can instantiate pages in page cache. We can have regular read(2) or page fault doing this but fadvise(2) or madvise(2) can also result in reading in page cache pages through force_page_cache_readahead(). There are couple of ways how to fix this. First way (currently implemented by XFS) is to protect read(2) and *advise(2) calls with i_rwsem so that they are serialized with hole punching. This is easy to do but as a result all reads would then be serialized with writes and thus mixed read-write workloads suffer heavily on ext4. Thus for ext4 I want to use EXT4_I(inode)->i_mmap_sem for serialization of reads and hole punching. The same serialization that is already currently used in ext4 to close this race for page faults. This is conceptually simple but lock ordering is troublesome - since EXT4_I(inode)->i_mmap_sem is used in page fault path, it ranks below mmap_sem. Thus we cannot simply grab EXT4_I(inode)->i_mmap_sem in ext4_file_read_iter() as generic_file_buffered_read() copies data to userspace which may require grabbing mmap_sem. Also grabbing EXT4_I(inode)->i_mmap_sem in ext4_readpages() / ext4_readpage() is problematic because at that point we already have locked pages instantiated in the page cache. So EXT4_I(inode)->i_mmap_sem would effectively rank below page lock which is too low in the locking hierarchy. So for ext4 (and other filesystems with similar locking constraints - F2FS, GFS2, OCFS2, ...) we'd need another hook in the read path that can wrap around insertion of pages into page cache but does not contain copying of data into userspace. This patch set implements one possibility of such hook - we essentially abstract generic_file_buffered_read_get_pages() into a hook. I'm not completely sold on the naming or the API, or even whether this is the best place for the hook. But I wanted to send something out for further discussion. For example another workable option for ext4 would be to have an aops hook for adding a page into page cache (essentially abstract add_to_page_cache_lru()). There will be slight downside that it would mean per-page acquisition of the lock instead of a per-batch-of-pages, also if we ever transition to range locking the mapping, per-batch locking would be more efficient. What do people think about this? Honza [1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@xxxxxxxxxxxxxx/