Re: [PATCH 2/2] ext4: skip non-loaded groups at cr=0/1

Alex Zhuravlev <azhuravlev@xxxxxxxxxxxxx> · Wed, 20 May 2020 19:59:09 +0000

> On 20 May 2020, at 22:34, Andreas Dilger <adilger@xxxxxxxxx> wrote:
> 
> On May 20, 2020, at 2:40 AM, Alex Zhuravlev <azhuravlev@xxxxxxxxxxxxx> wrote:
>> 
>>> On 17 May 2020, at 10:55, Andreas Dilger <adilger@xxxxxxxxx> wrote:
>>> 
>>> The question is whether this is situation is affecting only a few inode
>>> allocations for a short time after mount, or does this persist for a long
>>> time?  I think that it _should_ be only a short time, because these other
>>> threads should all start prefetch on their preferred groups, so even if a
>>> few inodes have their blocks allocated in the "wrong" group, it shouldn't
>>> be a long term problem since the prefetched bitmaps will finish loading
>>> and allow the blocks to be allocated, or skipped if group is fragmented.
>> 
>> Yes, that’s the idea - there is a short window when buddy data is being
>> populated. And for each “cluster” (not just a single group) prefetching
>> will be initiated by allocation.
>> It’s possible that some number of inodes will get “bad” blocks right after
>> after mount.
>> If you think this is a bad scenario I can introduce couple more things:
>> 1) few times discussed prefetching thread
>> 2) let mballoc wait for the goal group to get ready - this essentials one
>>   more check in ext4_mb_good_group()
> 
> IMHO, this is an acceptable "cache warmup" behavior, not really different
> than mballoc doing limited scanning when looking for any other allocation.
> Since we already separate inode table blocks and data blocks into separate
> groups due to flex_bg, I don't think any group is "better" than another,
> so long as the allocations are avoiding worst-case fragmentation (i.e. a
> series of one-block allocations).

I tend to agree, but refreshed the patch to enable waiting for the goal group
(one more check). Extra waiting for one group during warmup should be fine, IMO.

Thanks, Alex