Re: [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/8/9 2:36, Jan Kara wrote:
> On Thu 08-08-24 19:18:30, Zhang Yi wrote:
>> On 2024/8/8 1:41, Jan Kara wrote:
>>> On Fri 02-08-24 19:51:16, Zhang Yi wrote:
>>>> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>>>>
>>>> Now that we update data reserved space for delalloc after allocating
>>>> new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is
>>>> enabled, we also need to query the extents_status tree to calculate the
>>>> exact reserved clusters. This is complicated now and it appears that
>>>> it's better to do this job in ext4_es_insert_extent(), because
>>>> __es_remove_extent() have already count delalloc blocks when removing
>>>> delalloc extents and __revise_pending() return new adding pending count,
>>>> we could update the reserved blocks easily in ext4_es_insert_extent().
>>>>
>>>> Thers is one special case needs to concern is the quota claiming, when
>>>> bigalloc is enabled, if the delayed cluster allocation has been raced
>>>> by another no-delayed allocation(e.g. from fallocate) which doesn't
>>>> cover the delayed blocks:
>>>>
>>>>   |<       one cluster       >|
>>>>   hhhhhhhhhhhhhhhhhhhdddddddddd
>>>>   ^            ^
>>>>   |<          >| < fallocate this range, don't claim quota again
>>>>
>>>> We can't claim quota as usual because the fallocate has already claimed
>>>> it in ext4_mb_new_blocks(), we could notice this case through the
>>>> removed delalloc blocks count.
>>>>
>>>> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
>>> ...
>>>> @@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>>>>  			__free_pending(pr);
>>>>  			pr = NULL;
>>>>  		}
>>>> +		pending = err3;
>>>>  	}
>>>>  error:
>>>>  	write_unlock(&EXT4_I(inode)->i_es_lock);
>>>> +	/*
>>>> +	 * Reduce the reserved cluster count to reflect successful deferred
>>>> +	 * allocation of delayed allocated clusters or direct allocation of
>>>> +	 * clusters discovered to be delayed allocated.  Once allocated, a
>>>> +	 * cluster is not included in the reserved count.
>>>> +	 *
>>>> +	 * When bigalloc is enabled, allocating non-delayed allocated blocks
>>>> +	 * which belong to delayed allocated clusters (from fallocate, filemap,
>>>> +	 * DIO, or clusters allocated when delalloc has been disabled by
>>>> +	 * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(),
>>>> +	 * so release the quota reservations made for any previously delayed
>>>> +	 * allocated clusters.
>>>> +	 */
>>>> +	resv_used = rinfo.delonly_cluster + pending;
>>>> +	if (resv_used)
>>>> +		ext4_da_update_reserve_space(inode, resv_used,
>>>> +					     rinfo.delonly_block);
>>>
>>> I'm not sure I understand here. We are inserting extent into extent status
>>> tree. We are replacing resv_used clusters worth of space with delayed
>>> allocation reservation with normally allocated clusters so we need to
>>> release the reservation (mballoc already reduced freeclusters counter).
>>> That I understand. In normal case we should also claim quota because we are
>>> converting from reserved into allocated state. Now if we allocated blocks
>>> under this range (e.g. from fallocate()) without
>>> EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here
>>> instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is
>>> related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating
>>> blocks for this extent or not.
>>
>> Oh, this is really complicated due to the bigalloc feature, please let me
>> explain it more clearly by listing all related situations.
>>
>> There are 2 types of paths of allocating delayed/reserved cluster:
>> 1. Normal case, normally allocate delayed clusters from the write back path.
>> 2. Special case, allocate blocks under this delayed range, e.g. from
>>    fallocate().
>>
>> There are 4 situations below:
>>
>> A. bigalloc is disabled. This case is simple, after path 2, we don't need
>>    to distinguish path 1 and 2, when calling ext4_es_insert_extent(), we
>>    set EXT4_GET_BLOCKS_DELALLOC_RESERVE after EXT4_MAP_DELAYED bit is
>>    detected. If the flag is set, we must be replacing a delayed extent and
>>    rinfo.delonly_block must be > 0. So rinfo.delonly_block > 0 is equal
>>    to set EXT4_GET_BLOCKS_DELALLOC_RESERVE.
> 
> Right. So fallocate() will call ext4_map_blocks() and
> ext4_es_lookup_extent() will find delayed extent and set EXT4_MAP_DELAYED
> which you (due to patch 2 of this series) transform into
> EXT4_GET_BLOCKS_DELALLOC_RESERVE. We used to update the delalloc
> accounting through in ext4_ext_map_blocks() but this patch moved the update
> to ext4_es_insert_extent(). But there is one cornercase even here AFAICT:
> 
> Suppose fallocate is called for range 0..16k, we have delalloc extent at
> 8k..16k. In this case ext4_map_blocks() at block 0 will not find the
> delalloc extent but ext4_ext_map_blocks() will allocate 16k from mballoc
> without using delalloc reservation but then ext4_es_insert_extent() will
> still have rinfo.delonly_block > 0 so we claim the quota reservation
> instead of releasing it?
> 

After commit 6430dea07e85 ("ext4: correct the hole length returned by
ext4_map_blocks()"), the fallocate range 0-16K would be divided into two
rounds. When we first calling ext4_map_blocks() with 0-16K, the map range
will be corrected to 0-8k by ext4_ext_determine_insert_hole() and the
allocating range should not cover any delayed range. Then
ext4_alloc_file_blocks() will call ext4_map_blocks() again to allocate
8K-16K in the second round, in this round, we are allocating a real
delayed range. Please below graph for details,

ext4_alloc_file_blocks() //0-16K
 ext4_map_blocks()  //0-16K
  ext4_es_lookup_extent() //find nothing
   ext4_ext_map_blocks(0)
    ext4_ext_determine_insert_hole() //change map range to 0-8K
   ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE) //allocate blocks under hole
 ext4_map_blocks()  //8-16K
  ext4_es_lookup_extent() //find delayed extent
  ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE)
    //allocate blocks under a whole delayed range,
    //use rinfo.delonly_block > 0 is okay

Hence the allocating range can't mixed with delayed and non-delayed extent
at a time, and the rinfo.delonly_block > 0 should work.

>> B. bigalloc is enabled, there a 3 sub-cases of allocating a delayed
>>    cluster:
>> B0.Allocating a whole delayed cluster, this case is the same to A.
>>
>>      |<         one cluster       >|
>>      ddddddd+ddddddd+ddddddd+ddddddd
>>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ allocating the whole range
> 
> I agree. In this case there's no difference.
> 
>  
>> B1.Allocating delayed blocks in a reserved cluster, this case is the same
>>    to A, too.
>>
>>      |<         one cluster       >|
>>      hhhhhhh+hhhhhhh+ddddddd+ddddddd
>>                              ^^^^^^^
>>                              allocating this range
> 
> Yes, if the allocation starts within delalloc range, we will have
> EXT4_GET_BLOCKS_DELALLOC_RESERVE set and ndelonly_blocks will always be >
> 0.
> 
>> B2.Allocating blocks which doesn't cover the delayed blocks in one reserved
>>    cluster,
>>
>>      |<         one cluster       >|
>>      hhhhhhh+hhhhhhh+hhhhhhh+ddddddd
>>      ^^^^^^^
>>      fallocating this range
>>
>>   This case must from path 2, which means allocating blocks without
>>   EXT4_GET_BLOCKS_DELALLOC_RESERVE. In this case, rinfo.delonly_block must
>>   be 0 since we are not replacing any delayed extents, so
>>   rinfo.delonly_block == 0 means allocate blocks without EXT4_MAP_DELAYED
>>   detected, which further means that EXT4_GET_BLOCKS_DELALLOC_RESERVE is
>>   not set. So I think we could use rinfo.delonly_block to identify this
>>   case.
> 
> Well, this is similar to the non-bigalloc case I was asking about above.
> Why the allocated unwritten extent cannot extend past the start of delalloc
> extent? I didn't find anything that would disallow that...
> 

The same to above, ext4_ext_determine_insert_hole() should work fine for
this case.

Thanks,
Yi.





[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux