On Wed, Apr 10, 2024 at 10:29:16PM +0800, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@xxxxxxxxxx> > > Now we lookup extent status entry without holding the i_data_sem before > inserting delalloc block, it works fine in buffered write path and > because it holds i_rwsem and folio lock, and the mmap path holds folio > lock, so the found extent locklessly couldn't be modified concurrently. > But it could be raced by fallocate since it allocate block whitout > holding i_rwsem and folio lock. > > ext4_page_mkwrite() ext4_fallocate() > block_page_mkwrite() > ext4_da_map_blocks() > //find hole in extent status tree > ext4_alloc_file_blocks() > ext4_map_blocks() > //allocate block and unwritten extent > ext4_insert_delayed_block() > ext4_da_reserve_space() > //reserve one more block > ext4_es_insert_delayed_block() > //drop unwritten extent and add delayed extent by mistake Shouldn't this be serialised by the file invalidation lock? Hole punching via fallocate must do this to avoid data use-after-free bugs w.r.t racing page faults and all the other fallocate ops need to serialise page faults to avoid page cache level data corruption. Yet here we see a problem resulting from a fallocate operation racing with a page fault.... Ah, I see that the invalidation lock is only picked up deep inside ext4_punch_hole(), ext4_collapse_range(), ext4_insert_range() and ext4_zero_range(). They all do the same flush, lock, and dio wait preamble but each do it just a little bit differently. The allocation path does it just a little bit differently again and does not take the invalidate lock... Perhaps the ext4 fallocate code should be factored so that all the fallocate operations run the same flush, lock and wait code rather than having 5 slightly different copies of the same code? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx