On Tue 30-05-23 18:03:49, Ojaswin Mujoo wrote: > CR1_5 aims to optimize allocations which can't be satisfied in CR1. The > fact that we couldn't find a group in CR1 suggests that it would be > difficult to find a continuous extent to compleltely satisfy our > allocations. So before falling to the slower CR2, in CR1.5 we > proactively trim the the preallocations so we can find a group with > (free / fragments) big enough. This speeds up our allocation at the > cost of slightly reduced preallocation. > > The patch also adds a new sysfs tunable: > > * /sys/fs/ext4/<partition>/mb_cr1_5_max_trim_order > > This controls how much CR1.5 can trim a request before falling to CR2. > For example, for a request of order 7 and max trim order 2, CR1.5 can > trim this upto order 5. > > Suggested-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> > Signed-off-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> > Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> > > ext4 squash Why is this here? > +/* > + * We couldn't find a group in CR1 so try to find the highest free fragment > + * order we have and proactively trim the goal request length to that order to > + * find a suitable group faster. > + * > + * This optimizes allocation speed at the cost of slightly reduced > + * preallocations. However, we make sure that we don't trim the request too > + * much and fall to CR2 in that case. > + */ > +static void ext4_mb_choose_next_group_cr1_5(struct ext4_allocation_context *ac, > + enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups) > +{ > + struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); > + struct ext4_group_info *grp = NULL; > + int i, order, min_order; > + unsigned long num_stripe_clusters = 0; > + > + if (unlikely(ac->ac_flags & EXT4_MB_CR1_5_OPTIMIZED)) { > + if (sbi->s_mb_stats) > + atomic_inc(&sbi->s_bal_cr1_5_bad_suggestions); > + } > + > + /* > + * mb_avg_fragment_size_order() returns order in a way that makes > + * retrieving back the length using (1 << order) inaccurate. Hence, use > + * fls() instead since we need to know the actual length while modifying > + * goal length. > + */ > + order = fls(ac->ac_g_ex.fe_len); > + min_order = order - sbi->s_mb_cr1_5_max_trim_order; > + if (min_order < 0) > + min_order = 0; > + > + if (1 << min_order < ac->ac_o_ex.fe_len) > + min_order = fls(ac->ac_o_ex.fe_len) + 1; > + > + if (sbi->s_stripe > 0) { > + /* > + * We are assuming that stripe size is always a multiple of > + * cluster ratio otherwise __ext4_fill_super exists early. > + */ > + num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe); > + if (1 << min_order < num_stripe_clusters) > + min_order = fls(num_stripe_clusters); > + } > + > + for (i = order; i >= min_order; i--) { > + int frag_order; > + /* > + * Scale down goal len to make sure we find something > + * in the free fragments list. Basically, reduce > + * preallocations. > + */ > + ac->ac_g_ex.fe_len = 1 << i; I smell some off-by-one issues here. Look fls(1) == 1 so (1 << fls(n)) > n. Hence this loop will actually *grow* the goal allocation length. Also I'm not sure why you have +1 in min_order = fls(ac->ac_o_ex.fe_len) + 1. > + > + if (num_stripe_clusters > 0) { > + /* > + * Try to round up the adjusted goal to stripe size ^^^ goal length? > + * (in cluster units) multiple for efficiency. > + * > + * XXX: Is s->stripe always a power of 2? In that case > + * we can use the faster round_up() variant. > + */ I don't think s->stripe has to be a power of 2. E.g. when you have three data disks in a RAID config. Otherwise the patch looks good to me. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR