On Mon, 2023-08-14 at 23:04 -0600, Andreas Dilger wrote: > On Aug 11, 2023, at 12:19 AM, Li Dongyang <dongyangli@xxxxxxx> wrote: > > > > Currently the flag indicating block group has done fstrim is not > > persistent, and trim status will be lost after remount, as > > a result fstrim can not skip the already trimmed groups, which > > could be slow on very large devices. > > > > This patch introduces a new block group flag EXT4_BG_TRIMMED, > > we need 1 extra block group descriptor write after trimming each > > block group. > > When clearing the flag, the block group descriptor is journalled > > already so no extra overhead. > > > > Add a new super block flag EXT2_FLAGS_TRACK_TRIM, to indicate if > > we should honour EXT4_BG_TRIMMED when doing fstrim. > > The new super block flag can be turned on/off via tune2fs. > > Dongyang, > I think this is not *quite* correct in the case where the TRACK_TRIM > flag > is not set. I agree we want the BG_TRIMMED flag to always be cleared > in > that case when blocks are freed in a group (this has no added cost, > and > will maintain correctness even if the feature is disabled). > > However, it doesn't look like the patch will skip *writing* the flag > if > the TRACK_TRIM flag is unset, which would also add needless overhead > in > that case. I think it is OK to set the flag in memory to maintain > the > same behavior as today, and writing it to disk is fine (it will be > ignored > anyway), but it shouldn't trigger an extra transaction. I agree with the skip writing flag when TRACK_TRIM is not set. IMHO I don't think we should maintain essentially the same flags in memory if we are making the BG_TRIMMED flag persistent. > > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > > index 21b903fe546e..80283be01363 100644 > > --- a/fs/ext4/mballoc.c > > +++ b/fs/ext4/mballoc.c > > @@ -6995,10 +6993,19 @@ ext4_trim_all_free(struct super_block *sb, > > ext4_group_t group, > > ext4_grpblk_t minblocks, bool set_trimmed) > > { > > struct ext4_buddy e4b; > > + struct ext4_super_block *es = EXT4_SB(sb)->s_es; > > + struct ext4_group_desc *gdp; > > + struct buffer_head *gd_bh; > > int ret; > > > > trace_ext4_trim_all_free(sb, group, start, max); > > > > + gdp = ext4_get_group_desc(sb, group, &gd_bh); > > + if (!gdp) { > > + ret = -EIO; > > + return ret; > > + } > > + > > ret = ext4_mb_load_buddy(sb, group, &e4b); > > if (ret) { > > ext4_warning(sb, "Error %d loading buddy > > information for %u", > > @@ -7008,11 +7015,10 @@ ext4_trim_all_free(struct super_block *sb, > > ext4_group_t group, > > > > ext4_lock_group(sb, group); > > > > - if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) || > > + if (!(es->s_flags & cpu_to_le16(EXT2_FLAGS_TRACK_TRIM) && > > + gdp->bg_flags & cpu_to_le16(EXT4_BG_TRIMMED)) || > > minblocks < EXT4_SB(sb)->s_last_trim_minblks) { > > I think this should still *send* the TRIM request if BG_TRIMMED is > not > set, regardless of whether TRACK_TRIM is set or not, it should just > not save the flag to disk below. If BG_TRIMMED is not set, then TRIM request will be sent regardless already. Checking TRACK_TRIM here also gives us the option to use it as a switch: force fstrim everything regardless if the group has BG_TRIMMED or not. > > > ret = ext4_try_to_trim_range(sb, &e4b, start, max, > > minblocks); > > - if (ret >= 0 && set_trimmed) > > - EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info); > > This should clear the "set_trimmed" flag if there was an error, so > the > flag is not set below. We check if ret > 0 below, should be fine here. > > > } else { > > ret = 0; > > } > > @@ -7020,6 +7026,34 @@ ext4_trim_all_free(struct super_block *sb, > > ext4_group_t group, > > ext4_unlock_group(sb, group); > > ext4_mb_unload_buddy(&e4b); > > > > + if (ret > 0 && set_trimmed) { > > Here, this should check the TRACK_TRIM flag and not force the GDT > write > if the feature is disabled. *Not* writing the flag to disk is fine, > at > worst it means that another TRIM would be sent in case of a crash, > which > is what happened before this patch. Only the BG_TRIMMED flag should > be > set in the group descriptor in that case, based on the flag saved > above. Got it, will update the patch. Thanks Dongyang > > Cheers, Andreas > > > + int err; > > + handle_t *handle; > > + > > + handle = ext4_journal_start_sb(sb, EXT4_HT_FS_TRIM, > > 1); > > + if (IS_ERR(handle)) { > > + ret = PTR_ERR(handle); > > + goto out_return; > > + } > > + err = ext4_journal_get_write_access(handle, sb, > > gd_bh, > > + EXT4_JTR_NONE); > > + if (err) { > > + ret = err; > > + goto out_journal; > > + } > > + ext4_lock_group(sb, group); > > + gdp->bg_flags |= cpu_to_le16(EXT4_BG_TRIMMED); > > + ext4_group_desc_csum_set(sb, group, gdp); > > + ext4_unlock_group(sb, group); > > + err = ext4_handle_dirty_metadata(handle, NULL, > > gd_bh); > > + if (err) > > + ret = err; > > +out_journal: > > + err = ext4_journal_stop(handle); > > + if (err) > > + ret = err; > > + } > > +out_return: > > ext4_debug("trimmed %d blocks in the group %d\n", > > ret, group); > > > > -- > > 2.41.0 > > > > > Cheers, Andreas > > > > >