On 2024/8/8 1:41, Jan Kara wrote: > On Fri 02-08-24 19:51:16, Zhang Yi wrote: >> From: Zhang Yi <yi.zhang@xxxxxxxxxx> >> >> Now that we update data reserved space for delalloc after allocating >> new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is >> enabled, we also need to query the extents_status tree to calculate the >> exact reserved clusters. This is complicated now and it appears that >> it's better to do this job in ext4_es_insert_extent(), because >> __es_remove_extent() have already count delalloc blocks when removing >> delalloc extents and __revise_pending() return new adding pending count, >> we could update the reserved blocks easily in ext4_es_insert_extent(). >> >> Thers is one special case needs to concern is the quota claiming, when >> bigalloc is enabled, if the delayed cluster allocation has been raced >> by another no-delayed allocation(e.g. from fallocate) which doesn't >> cover the delayed blocks: >> >> |< one cluster >| >> hhhhhhhhhhhhhhhhhhhdddddddddd >> ^ ^ >> |< >| < fallocate this range, don't claim quota again >> >> We can't claim quota as usual because the fallocate has already claimed >> it in ext4_mb_new_blocks(), we could notice this case through the >> removed delalloc blocks count. >> >> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> > ... >> @@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk, >> __free_pending(pr); >> pr = NULL; >> } >> + pending = err3; >> } >> error: >> write_unlock(&EXT4_I(inode)->i_es_lock); >> + /* >> + * Reduce the reserved cluster count to reflect successful deferred >> + * allocation of delayed allocated clusters or direct allocation of >> + * clusters discovered to be delayed allocated. Once allocated, a >> + * cluster is not included in the reserved count. >> + * >> + * When bigalloc is enabled, allocating non-delayed allocated blocks >> + * which belong to delayed allocated clusters (from fallocate, filemap, >> + * DIO, or clusters allocated when delalloc has been disabled by >> + * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(), >> + * so release the quota reservations made for any previously delayed >> + * allocated clusters. >> + */ >> + resv_used = rinfo.delonly_cluster + pending; >> + if (resv_used) >> + ext4_da_update_reserve_space(inode, resv_used, >> + rinfo.delonly_block); > > I'm not sure I understand here. We are inserting extent into extent status > tree. We are replacing resv_used clusters worth of space with delayed > allocation reservation with normally allocated clusters so we need to > release the reservation (mballoc already reduced freeclusters counter). > That I understand. In normal case we should also claim quota because we are > converting from reserved into allocated state. Now if we allocated blocks > under this range (e.g. from fallocate()) without > EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here > instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is > related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating > blocks for this extent or not. > Oh, this is really complicated due to the bigalloc feature, please let me explain it more clearly by listing all related situations. There are 2 types of paths of allocating delayed/reserved cluster: 1. Normal case, normally allocate delayed clusters from the write back path. 2. Special case, allocate blocks under this delayed range, e.g. from fallocate(). There are 4 situations below: A. bigalloc is disabled. This case is simple, after path 2, we don't need to distinguish path 1 and 2, when calling ext4_es_insert_extent(), we set EXT4_GET_BLOCKS_DELALLOC_RESERVE after EXT4_MAP_DELAYED bit is detected. If the flag is set, we must be replacing a delayed extent and rinfo.delonly_block must be > 0. So rinfo.delonly_block > 0 is equal to set EXT4_GET_BLOCKS_DELALLOC_RESERVE. B. bigalloc is enabled, there a 3 sub-cases of allocating a delayed cluster: B0.Allocating a whole delayed cluster, this case is the same to A. |< one cluster >| ddddddd+ddddddd+ddddddd+ddddddd ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ allocating the whole range B1.Allocating delayed blocks in a reserved cluster, this case is the same to A, too. |< one cluster >| hhhhhhh+hhhhhhh+ddddddd+ddddddd ^^^^^^^ allocating this range B2.Allocating blocks which doesn't cover the delayed blocks in one reserved cluster, |< one cluster >| hhhhhhh+hhhhhhh+hhhhhhh+ddddddd ^^^^^^^ fallocating this range This case must from path 2, which means allocating blocks without EXT4_GET_BLOCKS_DELALLOC_RESERVE. In this case, rinfo.delonly_block must be 0 since we are not replacing any delayed extents, so rinfo.delonly_block == 0 means allocate blocks without EXT4_MAP_DELAYED detected, which further means that EXT4_GET_BLOCKS_DELALLOC_RESERVE is not set. So I think we could use rinfo.delonly_block to identify this case. As the cases listed above, I think we could use rinfo.delonly_block to determine whether the EXT4_GET_BLOCKS_DELALLOC_RESERVE is set, so I use it to determine if we need to claim quota or release quota. > At this point it would seem much clearer if we passed flag to > ext4_es_insert_extent() whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set > when allocating extent or not instead of computing delonly_block and > somehow infering from that. But maybe I miss some obvious reason why that > is correct. > Yeah, I agree that infer from computing delonly_block is little obscure and not clear enough, passing a flag is a clearer solution, but we have to pass one more parameter to ext4_es_insert_extent() which could only be set or not set in the allocating path in ext4_map_create_blocks(), other 5 callers don't care about it (so they should always have no EXT4_GET_BLOCKS_DELALLOC_RESERVE flag set theoretically). I have no strong feeling of which one is better, which one do you perfer after reading my explanation above? Thanks, Yi.