On 2024/8/9 2:36, Jan Kara wrote: > On Thu 08-08-24 19:18:30, Zhang Yi wrote: >> On 2024/8/8 1:41, Jan Kara wrote: >>> On Fri 02-08-24 19:51:16, Zhang Yi wrote: >>>> From: Zhang Yi <yi.zhang@xxxxxxxxxx> >>>> >>>> Now that we update data reserved space for delalloc after allocating >>>> new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is >>>> enabled, we also need to query the extents_status tree to calculate the >>>> exact reserved clusters. This is complicated now and it appears that >>>> it's better to do this job in ext4_es_insert_extent(), because >>>> __es_remove_extent() have already count delalloc blocks when removing >>>> delalloc extents and __revise_pending() return new adding pending count, >>>> we could update the reserved blocks easily in ext4_es_insert_extent(). >>>> >>>> Thers is one special case needs to concern is the quota claiming, when >>>> bigalloc is enabled, if the delayed cluster allocation has been raced >>>> by another no-delayed allocation(e.g. from fallocate) which doesn't >>>> cover the delayed blocks: >>>> >>>> |< one cluster >| >>>> hhhhhhhhhhhhhhhhhhhdddddddddd >>>> ^ ^ >>>> |< >| < fallocate this range, don't claim quota again >>>> >>>> We can't claim quota as usual because the fallocate has already claimed >>>> it in ext4_mb_new_blocks(), we could notice this case through the >>>> removed delalloc blocks count. >>>> >>>> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> >>> ... >>>> @@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk, >>>> __free_pending(pr); >>>> pr = NULL; >>>> } >>>> + pending = err3; >>>> } >>>> error: >>>> write_unlock(&EXT4_I(inode)->i_es_lock); >>>> + /* >>>> + * Reduce the reserved cluster count to reflect successful deferred >>>> + * allocation of delayed allocated clusters or direct allocation of >>>> + * clusters discovered to be delayed allocated. Once allocated, a >>>> + * cluster is not included in the reserved count. >>>> + * >>>> + * When bigalloc is enabled, allocating non-delayed allocated blocks >>>> + * which belong to delayed allocated clusters (from fallocate, filemap, >>>> + * DIO, or clusters allocated when delalloc has been disabled by >>>> + * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(), >>>> + * so release the quota reservations made for any previously delayed >>>> + * allocated clusters. >>>> + */ >>>> + resv_used = rinfo.delonly_cluster + pending; >>>> + if (resv_used) >>>> + ext4_da_update_reserve_space(inode, resv_used, >>>> + rinfo.delonly_block); >>> >>> I'm not sure I understand here. We are inserting extent into extent status >>> tree. We are replacing resv_used clusters worth of space with delayed >>> allocation reservation with normally allocated clusters so we need to >>> release the reservation (mballoc already reduced freeclusters counter). >>> That I understand. In normal case we should also claim quota because we are >>> converting from reserved into allocated state. Now if we allocated blocks >>> under this range (e.g. from fallocate()) without >>> EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here >>> instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is >>> related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating >>> blocks for this extent or not. >> >> Oh, this is really complicated due to the bigalloc feature, please let me >> explain it more clearly by listing all related situations. >> >> There are 2 types of paths of allocating delayed/reserved cluster: >> 1. Normal case, normally allocate delayed clusters from the write back path. >> 2. Special case, allocate blocks under this delayed range, e.g. from >> fallocate(). >> >> There are 4 situations below: >> >> A. bigalloc is disabled. This case is simple, after path 2, we don't need >> to distinguish path 1 and 2, when calling ext4_es_insert_extent(), we >> set EXT4_GET_BLOCKS_DELALLOC_RESERVE after EXT4_MAP_DELAYED bit is >> detected. If the flag is set, we must be replacing a delayed extent and >> rinfo.delonly_block must be > 0. So rinfo.delonly_block > 0 is equal >> to set EXT4_GET_BLOCKS_DELALLOC_RESERVE. > > Right. So fallocate() will call ext4_map_blocks() and > ext4_es_lookup_extent() will find delayed extent and set EXT4_MAP_DELAYED > which you (due to patch 2 of this series) transform into > EXT4_GET_BLOCKS_DELALLOC_RESERVE. We used to update the delalloc > accounting through in ext4_ext_map_blocks() but this patch moved the update > to ext4_es_insert_extent(). But there is one cornercase even here AFAICT: > > Suppose fallocate is called for range 0..16k, we have delalloc extent at > 8k..16k. In this case ext4_map_blocks() at block 0 will not find the > delalloc extent but ext4_ext_map_blocks() will allocate 16k from mballoc > without using delalloc reservation but then ext4_es_insert_extent() will > still have rinfo.delonly_block > 0 so we claim the quota reservation > instead of releasing it? > After commit 6430dea07e85 ("ext4: correct the hole length returned by ext4_map_blocks()"), the fallocate range 0-16K would be divided into two rounds. When we first calling ext4_map_blocks() with 0-16K, the map range will be corrected to 0-8k by ext4_ext_determine_insert_hole() and the allocating range should not cover any delayed range. Then ext4_alloc_file_blocks() will call ext4_map_blocks() again to allocate 8K-16K in the second round, in this round, we are allocating a real delayed range. Please below graph for details, ext4_alloc_file_blocks() //0-16K ext4_map_blocks() //0-16K ext4_es_lookup_extent() //find nothing ext4_ext_map_blocks(0) ext4_ext_determine_insert_hole() //change map range to 0-8K ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE) //allocate blocks under hole ext4_map_blocks() //8-16K ext4_es_lookup_extent() //find delayed extent ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE) //allocate blocks under a whole delayed range, //use rinfo.delonly_block > 0 is okay Hence the allocating range can't mixed with delayed and non-delayed extent at a time, and the rinfo.delonly_block > 0 should work. >> B. bigalloc is enabled, there a 3 sub-cases of allocating a delayed >> cluster: >> B0.Allocating a whole delayed cluster, this case is the same to A. >> >> |< one cluster >| >> ddddddd+ddddddd+ddddddd+ddddddd >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ allocating the whole range > > I agree. In this case there's no difference. > > >> B1.Allocating delayed blocks in a reserved cluster, this case is the same >> to A, too. >> >> |< one cluster >| >> hhhhhhh+hhhhhhh+ddddddd+ddddddd >> ^^^^^^^ >> allocating this range > > Yes, if the allocation starts within delalloc range, we will have > EXT4_GET_BLOCKS_DELALLOC_RESERVE set and ndelonly_blocks will always be > > 0. > >> B2.Allocating blocks which doesn't cover the delayed blocks in one reserved >> cluster, >> >> |< one cluster >| >> hhhhhhh+hhhhhhh+hhhhhhh+ddddddd >> ^^^^^^^ >> fallocating this range >> >> This case must from path 2, which means allocating blocks without >> EXT4_GET_BLOCKS_DELALLOC_RESERVE. In this case, rinfo.delonly_block must >> be 0 since we are not replacing any delayed extents, so >> rinfo.delonly_block == 0 means allocate blocks without EXT4_MAP_DELAYED >> detected, which further means that EXT4_GET_BLOCKS_DELALLOC_RESERVE is >> not set. So I think we could use rinfo.delonly_block to identify this >> case. > > Well, this is similar to the non-bigalloc case I was asking about above. > Why the allocated unwritten extent cannot extend past the start of delalloc > extent? I didn't find anything that would disallow that... > The same to above, ext4_ext_determine_insert_hole() should work fine for this case. Thanks, Yi.