On Thu 08-08-24 19:18:30, Zhang Yi wrote: > On 2024/8/8 1:41, Jan Kara wrote: > > On Fri 02-08-24 19:51:16, Zhang Yi wrote: > >> From: Zhang Yi <yi.zhang@xxxxxxxxxx> > >> > >> Now that we update data reserved space for delalloc after allocating > >> new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is > >> enabled, we also need to query the extents_status tree to calculate the > >> exact reserved clusters. This is complicated now and it appears that > >> it's better to do this job in ext4_es_insert_extent(), because > >> __es_remove_extent() have already count delalloc blocks when removing > >> delalloc extents and __revise_pending() return new adding pending count, > >> we could update the reserved blocks easily in ext4_es_insert_extent(). > >> > >> Thers is one special case needs to concern is the quota claiming, when > >> bigalloc is enabled, if the delayed cluster allocation has been raced > >> by another no-delayed allocation(e.g. from fallocate) which doesn't > >> cover the delayed blocks: > >> > >> |< one cluster >| > >> hhhhhhhhhhhhhhhhhhhdddddddddd > >> ^ ^ > >> |< >| < fallocate this range, don't claim quota again > >> > >> We can't claim quota as usual because the fallocate has already claimed > >> it in ext4_mb_new_blocks(), we could notice this case through the > >> removed delalloc blocks count. > >> > >> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> > > ... > >> @@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk, > >> __free_pending(pr); > >> pr = NULL; > >> } > >> + pending = err3; > >> } > >> error: > >> write_unlock(&EXT4_I(inode)->i_es_lock); > >> + /* > >> + * Reduce the reserved cluster count to reflect successful deferred > >> + * allocation of delayed allocated clusters or direct allocation of > >> + * clusters discovered to be delayed allocated. Once allocated, a > >> + * cluster is not included in the reserved count. > >> + * > >> + * When bigalloc is enabled, allocating non-delayed allocated blocks > >> + * which belong to delayed allocated clusters (from fallocate, filemap, > >> + * DIO, or clusters allocated when delalloc has been disabled by > >> + * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(), > >> + * so release the quota reservations made for any previously delayed > >> + * allocated clusters. > >> + */ > >> + resv_used = rinfo.delonly_cluster + pending; > >> + if (resv_used) > >> + ext4_da_update_reserve_space(inode, resv_used, > >> + rinfo.delonly_block); > > > > I'm not sure I understand here. We are inserting extent into extent status > > tree. We are replacing resv_used clusters worth of space with delayed > > allocation reservation with normally allocated clusters so we need to > > release the reservation (mballoc already reduced freeclusters counter). > > That I understand. In normal case we should also claim quota because we are > > converting from reserved into allocated state. Now if we allocated blocks > > under this range (e.g. from fallocate()) without > > EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here > > instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is > > related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating > > blocks for this extent or not. > > Oh, this is really complicated due to the bigalloc feature, please let me > explain it more clearly by listing all related situations. > > There are 2 types of paths of allocating delayed/reserved cluster: > 1. Normal case, normally allocate delayed clusters from the write back path. > 2. Special case, allocate blocks under this delayed range, e.g. from > fallocate(). > > There are 4 situations below: > > A. bigalloc is disabled. This case is simple, after path 2, we don't need > to distinguish path 1 and 2, when calling ext4_es_insert_extent(), we > set EXT4_GET_BLOCKS_DELALLOC_RESERVE after EXT4_MAP_DELAYED bit is > detected. If the flag is set, we must be replacing a delayed extent and > rinfo.delonly_block must be > 0. So rinfo.delonly_block > 0 is equal > to set EXT4_GET_BLOCKS_DELALLOC_RESERVE. Right. So fallocate() will call ext4_map_blocks() and ext4_es_lookup_extent() will find delayed extent and set EXT4_MAP_DELAYED which you (due to patch 2 of this series) transform into EXT4_GET_BLOCKS_DELALLOC_RESERVE. We used to update the delalloc accounting through in ext4_ext_map_blocks() but this patch moved the update to ext4_es_insert_extent(). But there is one cornercase even here AFAICT: Suppose fallocate is called for range 0..16k, we have delalloc extent at 8k..16k. In this case ext4_map_blocks() at block 0 will not find the delalloc extent but ext4_ext_map_blocks() will allocate 16k from mballoc without using delalloc reservation but then ext4_es_insert_extent() will still have rinfo.delonly_block > 0 so we claim the quota reservation instead of releasing it? > B. bigalloc is enabled, there a 3 sub-cases of allocating a delayed > cluster: > B0.Allocating a whole delayed cluster, this case is the same to A. > > |< one cluster >| > ddddddd+ddddddd+ddddddd+ddddddd > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ allocating the whole range I agree. In this case there's no difference. > B1.Allocating delayed blocks in a reserved cluster, this case is the same > to A, too. > > |< one cluster >| > hhhhhhh+hhhhhhh+ddddddd+ddddddd > ^^^^^^^ > allocating this range Yes, if the allocation starts within delalloc range, we will have EXT4_GET_BLOCKS_DELALLOC_RESERVE set and ndelonly_blocks will always be > 0. > B2.Allocating blocks which doesn't cover the delayed blocks in one reserved > cluster, > > |< one cluster >| > hhhhhhh+hhhhhhh+hhhhhhh+ddddddd > ^^^^^^^ > fallocating this range > > This case must from path 2, which means allocating blocks without > EXT4_GET_BLOCKS_DELALLOC_RESERVE. In this case, rinfo.delonly_block must > be 0 since we are not replacing any delayed extents, so > rinfo.delonly_block == 0 means allocate blocks without EXT4_MAP_DELAYED > detected, which further means that EXT4_GET_BLOCKS_DELALLOC_RESERVE is > not set. So I think we could use rinfo.delonly_block to identify this > case. Well, this is similar to the non-bigalloc case I was asking about above. Why the allocated unwritten extent cannot extend past the start of delalloc extent? I didn't find anything that would disallow that... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR