> On Jul 17, 2020, at 9:53 AM, Theodore Ts'o <tytso@xxxxxxx> wrote: > > From: Alex Zhuravlev <azhuravlev@xxxxxxxxxxxxx> > > cr=0 is supposed to be an optimization to save CPU cycles, but if > buddy data (in memory) is not initialized then all this makes no sense > as we have to do sync IO taking a lot of cycles. also, at cr=0 > mballoc doesn't store any avaibale chunk. cr=1 also skips groups using (typo) "available", but "doesn't store any available chunk" is still a bit confusing. Maybe "doesn't *choose* any available chunk"? > heuristic based on avg. fragment size. it's more useful to skip such > groups and switch to cr=2 where groups will be scanned for available > chunks. > > using sparse image and dm-slow virtual device of 120TB was > simulated. then the image was formatted and filled using debugfs to > mark ~85% of available space as busy. mount process w/o the patch > couldn't complete in half an hour (according to vmstat it would take > ~10-11 hours). With the patch applied mount took ~20 seconds. > > Lustre-bug-id: https://jira.whamcloud.com/browse/LU-12988 > Signed-off-by: Alex Zhuravlev <bzzz@xxxxxxxxxxxxx> > Reviewed-by: Andreas Dilger <adilger@xxxxxxxxxxxxx> Purely cosmetic fix below, but still fine otherwise. > --- > fs/ext4/mballoc.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 8a1e6e03c088..172994349bf6 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -2195,7 +2195,18 @@ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac, > > /* We only do this if the grp has never been initialized */ > if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { > - ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS); > + struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, > + NULL); > + int ret; (style) this might be nicer to read like: struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, NULL); int ret; > + /* cr=0/1 is a very optimistic search to find large > + * good chunks almost for free. if buddy data is > + * not ready, then this optimization makes no sense */ > + if (cr < 2 && > + !(ext4_has_group_desc_csum(sb) && > + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) > + return 0; > + ret = ext4_mb_init_group(sb, group, GFP_NOFS); > if (ret) > return ret; > } > -- > 2.24.1 > Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP