> On 20 May 2020, at 22:34, Andreas Dilger <adilger@xxxxxxxxx> wrote: > > On May 20, 2020, at 2:40 AM, Alex Zhuravlev <azhuravlev@xxxxxxxxxxxxx> wrote: >> >>> On 17 May 2020, at 10:55, Andreas Dilger <adilger@xxxxxxxxx> wrote: >>> >>> The question is whether this is situation is affecting only a few inode >>> allocations for a short time after mount, or does this persist for a long >>> time? I think that it _should_ be only a short time, because these other >>> threads should all start prefetch on their preferred groups, so even if a >>> few inodes have their blocks allocated in the "wrong" group, it shouldn't >>> be a long term problem since the prefetched bitmaps will finish loading >>> and allow the blocks to be allocated, or skipped if group is fragmented. >> >> Yes, that’s the idea - there is a short window when buddy data is being >> populated. And for each “cluster” (not just a single group) prefetching >> will be initiated by allocation. >> It’s possible that some number of inodes will get “bad” blocks right after >> after mount. >> If you think this is a bad scenario I can introduce couple more things: >> 1) few times discussed prefetching thread >> 2) let mballoc wait for the goal group to get ready - this essentials one >> more check in ext4_mb_good_group() > > IMHO, this is an acceptable "cache warmup" behavior, not really different > than mballoc doing limited scanning when looking for any other allocation. > Since we already separate inode table blocks and data blocks into separate > groups due to flex_bg, I don't think any group is "better" than another, > so long as the allocations are avoiding worst-case fragmentation (i.e. a > series of one-block allocations). I tend to agree, but refreshed the patch to enable waiting for the goal group (one more check). Extra waiting for one group during warmup should be fine, IMO. Thanks, Alex