Re: [RFC 11/11] ext4: Add allocation criteria 1.5 (CR1_5)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 09, 2023 at 04:06:49PM +0100, Jan Kara wrote:
> On Fri 27-01-23 18:07:38, Ojaswin Mujoo wrote:
> > CR1_5 aims to optimize allocations which can't be satisfied in CR1. The
> > fact that we couldn't find a group in CR1 suggests that it would be
> > difficult to find a continuous extent to compleltely satisfy our
> > allocations. So before falling to the slower CR2, in CR1.5 we
> > proactively trim the the preallocations so we can find a group with
> > (free / fragments) big enough.  This speeds up our allocation at the
> > cost of slightly reduced preallocation.
> > 
> > The patch also adds a new sysfs tunable:
> > 
> > * /sys/fs/ext4/<partition>/mb_cr1_5_max_trim_order
> > 
> > This controls how much CR1.5 can trim a request before falling to CR2.
> > For example, for a request of order 7 and max trim order 2, CR1.5 can
> > trim this upto order 5.
> > 
> > Signed-off-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>
> > Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx>
> 
> The idea looks good. Couple of questions below...
> 
> > +/*
> > + * We couldn't find a group in CR1 so try to find the highest free fragment
> > + * order we have and proactively trim the goal request length to that order to
> > + * find a suitable group faster.
> > + *
> > + * This optimizes allocation speed at the cost of slightly reduced
> > + * preallocations. However, we make sure that we don't trim the request too
> > + * much and fall to CR2 in that case.
> > + */
> > +static void ext4_mb_choose_next_group_cr1_5(struct ext4_allocation_context *ac,
> > +		enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups)
> > +{
> > +	struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
> > +	struct ext4_group_info *grp = NULL;
> > +	int i, order, min_order;
> > +
> > +	if (unlikely(ac->ac_flags & EXT4_MB_CR1_5_OPTIMIZED)) {
> > +		if (sbi->s_mb_stats)
> > +			atomic_inc(&sbi->s_bal_cr1_5_bad_suggestions);
> > +	}
> > +
> > +	/*
> > +	 * mb_avg_fragment_size_order() returns order in a way that makes
> > +	 * retrieving back the length using (1 << order) inaccurate. Hence, use
> > +	 * fls() instead since we need to know the actual length while modifying
> > +	 * goal length.
> > +	 */
> > +	order = fls(ac->ac_g_ex.fe_len);
> > +	min_order = order - sbi->s_mb_cr1_5_max_trim_order;
> 
> Given we still require the allocation contains at least originally
> requested blocks, is it ever the case that goal size would be 8 times
> larger than original alloc size? Otherwise the
> sbi->s_mb_cr1_5_max_trim_order logic seems a bit pointless...

Yes that is possible. In ext4_mb_normalize_request, for orignal request len <
8MB we actually determine the goal length based on the length of the
file (i_size) rather than the length of the original request. For eg:

	if (size <= 16 * 1024) {
		size = 16 * 1024;
	} else if (size <= 32 * 1024) {
		size = 32 * 1024;
	} else if (size <= 64 * 1024) {
		size = 64 * 1024;

and this goes all the way upto size = 8MB. So for a case where the file
is >8MB, even if the original len is of 1 block(4KB), the goal len would
be of 2048 blocks(8MB). That's why we decided to add a tunable depending
on the user's preference.
> 
> > +	if (min_order < 0)
> > +		min_order = 0;
> 
> Perhaps add:
> 
> 	if (1 << min_order < ac->ac_o_ex.fe_len)
> 		min_order = fls(ac->ac_o_ex.fe_len) + 1;
> 
> and then you can drop the condition from the loop below...
That looks better, will do it. Thanks!
> 
> > +
> > +	for (i = order; i >= min_order; i--) {
> > +		if (ac->ac_o_ex.fe_len <= (1 << i)) {
> > +			/*
> > +			 * Scale down goal len to make sure we find something
> > +			 * in the free fragments list. Basically, reduce
> > +			 * preallocations.
> > +			 */
> > +			ac->ac_g_ex.fe_len = 1 << i;
> 
> When scaling down the size with sbi->s_stripe > 1, it would be better to
> choose multiple of sbi->s_stripe and not power of two. But our stripe
> support is fairly weak anyway (e.g. initial goal size does not reflect it
> at all AFAICT) so probably we don't care here either.
Oh right, i missed that. I'll make the change as it doesn't harm to have
it here.

Thanks for the review!

regards,
ojaswin
> 
> > +		} else {
> > +			break;
> > +		}
> > +
> > +		grp = ext4_mb_find_good_group_avg_frag_lists(ac,
> > +							     mb_avg_fragment_size_order(ac->ac_sb,
> > +							     ac->ac_g_ex.fe_len));
> > +		if (grp)
> > +			break;
> > +	}
> > +
> > +	if (grp) {
> > +		*group = grp->bb_group;
> > +		ac->ac_flags |= EXT4_MB_CR1_5_OPTIMIZED;
> > +	} else {
> > +		/* Reset goal length to original goal length before falling into CR2 */
> > +		ac->ac_g_ex.fe_len = ac->ac_orig_goal_len;
> >  		*new_cr = CR2;
> >  	}
> >  }
> 
> 								Honza
> -- 
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux