On 2024/12/12 0:00, Jan Kara wrote: > On Wed 11-12-24 15:59:51, Zhang Yi wrote: >> On 2024/12/10 20:57, Jan Kara wrote: >>> On Mon 09-12-24 16:32:41, Zhang Yi wrote: >>> b) evict all page cache in the affected range -> should stop writeback - >>> *but* currently there's one case which could be problematic. Assume we >>> do punch hole 0..N and the page at N+1 is dirty. Punch hole does all of >>> the above and starts removing blocks, needs to restart transaction so it >>> drops i_data_sem. Writeback starts for page N+1, needs to load extent >>> block into memory, ext4_cache_extents() now loads back some extents >>> covering range 0..N into extent status tree. >> >> This completely confuses me. Do you mention the case below, >> >> There are many extent entries in the page range 0..N+1, for example, >> >> 0 N N+1 >> | |/ >> [www][wwwwww][wwwwwwww]...[wwwww][wwww]... >> | | >> N-a N-b >> >> Punch hole is removing each extent entries from N..0 >> (ext4_ext_remove_space() removes blocks from end to start), and could >> drop i_data_sem just after manipulating(removing) the extent entry >> [N-a,N-b], At the same time, a concurrent writeback start write back >> page N+1 since the writeback only hold page lock, doesn't hold i_rwsem >> and invalidate_lock. It may load back the extents 0..N-a into the >> extent status tree again while finding extent that contains N+1? > > Yes, because when we load extents from extent tree, we insert all extents > from the leaf of the extent tree into extent status tree. That's what > ext4_cache_extents() call does. > >> Finally it may left some stale extent status entries after punch hole >> is done? > > Yes, there may be stale extents in the extent status tree when > ext4_ext_remove_space() returns. But punch hole in particular then does: > > ext4_es_insert_extent(inode, first_block, hole_len, ~0, > EXTENT_STATUS_HOLE, 0); > > which overwrites these stale extents with appropriate information. > Yes, that's correct! I missed this insert yesterday. It looks fine now, as it holds the i_rwsem and invalidate_lock, and has evicted the page cache in this case. Thanks a lot for your detail explanation. I will add these document in my next iteration. Thanks! Yi. >> If my understanding is correct, isn't that a problem that exists now? >> I mean without this patch series. > > Yes, the situation isn't really related to your patches. But with your > patches we are starting to rely even more on extent status tree vs extent > tree consistecy. So I wanted to spell out this situation to verify new > problem isn't introduced and so that we create rules that handle this > situation well. > >>> So the only protection >>> against using freed blocks is that nobody should be mapping anything in >>> the range 0..N because we hold those locks & have evicted page cache. >>> >>> So I think we need to also document, that anybody mapping blocks needs to >>> hold i_rwsem or invalidate_lock or a page lock, ideally asserting that in >>> ext4_map_blocks() to catch cases we missed. Asserting for page lock will >>> not be really doable but luckily only page writeback needs that so that can >>> get some extemption from the assert. >> >> In the case above, it seems that merely holding a page lock is >> insufficient? > > Well, holding page lock(s) for the range you are operating on is enough to > make sure there cannot be parallel operations on that range like truncate, > punch hole or similar, because they always remove the page cache before > starting their work and because they hold invalidate_lock, new pages cannot > be created while they are working. > > Honza