Re: [PATCH 1/2] ext4: introduce EXT4_BG_TRIMMED to optimize fstrim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2023-08-14 at 23:04 -0600, Andreas Dilger wrote:
> On Aug 11, 2023, at 12:19 AM, Li Dongyang <dongyangli@xxxxxxx> wrote:
> > 
> > Currently the flag indicating block group has done fstrim is not
> > persistent, and trim status will be lost after remount, as
> > a result fstrim can not skip the already trimmed groups, which
> > could be slow on very large devices.
> > 
> > This patch introduces a new block group flag EXT4_BG_TRIMMED,
> > we need 1 extra block group descriptor write after trimming each
> > block group.
> > When clearing the flag, the block group descriptor is journalled
> > already so no extra overhead.
> > 
> > Add a new super block flag EXT2_FLAGS_TRACK_TRIM, to indicate if
> > we should honour EXT4_BG_TRIMMED when doing fstrim.
> > The new super block flag can be turned on/off via tune2fs.
> 
> Dongyang,
> I think this is not *quite* correct in the case where the TRACK_TRIM
> flag
> is not set.  I agree we want the BG_TRIMMED flag to always be cleared
> in
> that case when blocks are freed in a group (this has no added cost,
> and
> will maintain correctness even if the feature is disabled).
> 
> However, it doesn't look like the patch will skip *writing* the flag
> if
> the TRACK_TRIM flag is unset, which would also add needless overhead
> in
> that case.  I think it is OK to set the flag in memory to maintain
> the
> same behavior as today, and writing it to disk is fine (it will be
> ignored
> anyway), but it shouldn't trigger an extra transaction.
I agree with the skip writing flag when TRACK_TRIM is not set.
IMHO I don't think we should maintain essentially the same flags in
memory if we are making the BG_TRIMMED flag persistent.
> 
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index 21b903fe546e..80283be01363 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -6995,10 +6993,19 @@ ext4_trim_all_free(struct super_block *sb,
> > ext4_group_t group,
> >                    ext4_grpblk_t minblocks, bool set_trimmed)
> > {
> >         struct ext4_buddy e4b;
> > +       struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> > +       struct ext4_group_desc *gdp;
> > +       struct buffer_head *gd_bh;
> >         int ret;
> > 
> >         trace_ext4_trim_all_free(sb, group, start, max);
> > 
> > +       gdp = ext4_get_group_desc(sb, group, &gd_bh);
> > +       if (!gdp) {
> > +               ret = -EIO;
> > +               return ret;
> > +       }
> > +
> >         ret = ext4_mb_load_buddy(sb, group, &e4b);
> >         if (ret) {
> >                 ext4_warning(sb, "Error %d loading buddy
> > information for %u",
> > @@ -7008,11 +7015,10 @@ ext4_trim_all_free(struct super_block *sb,
> > ext4_group_t group,
> > 
> >         ext4_lock_group(sb, group);
> > 
> > -       if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
> > +       if (!(es->s_flags & cpu_to_le16(EXT2_FLAGS_TRACK_TRIM) &&
> > +             gdp->bg_flags & cpu_to_le16(EXT4_BG_TRIMMED)) ||
> >             minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
> 
> I think this should still *send* the TRIM request if BG_TRIMMED is
> not
> set, regardless of whether TRACK_TRIM is set or not, it should just
> not save the flag to disk below.
If BG_TRIMMED is not set, then TRIM request will be sent regardless
already.
Checking TRACK_TRIM here also gives us the option to use it as a
switch: force fstrim everything regardless if the group has BG_TRIMMED
or not.
> 
> >                 ret = ext4_try_to_trim_range(sb, &e4b, start, max,
> > minblocks);
> > -               if (ret >= 0 && set_trimmed)
> > -                       EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
> 
> This should clear the "set_trimmed" flag if there was an error, so
> the
> flag is not set below.
We check if ret > 0 below, should be fine here.
> 
> >         } else {
> >                 ret = 0;
> >         }
> > @@ -7020,6 +7026,34 @@ ext4_trim_all_free(struct super_block *sb,
> > ext4_group_t group,
> >         ext4_unlock_group(sb, group);
> >         ext4_mb_unload_buddy(&e4b);
> > 
> > +       if (ret > 0 && set_trimmed) {
> 
> Here, this should check the TRACK_TRIM flag and not force the GDT
> write
> if the feature is disabled.  *Not* writing the flag to disk is fine,
> at
> worst it means that another TRIM would be sent in case of a crash,
> which
> is what happened before this patch.  Only the BG_TRIMMED flag should
> be
> set in the group descriptor in that case, based on the flag saved
> above.
Got it, will update the patch.

Thanks
Dongyang
> 
> Cheers, Andreas
> 
> > +               int err;
> > +               handle_t *handle;
> > +
> > +               handle = ext4_journal_start_sb(sb, EXT4_HT_FS_TRIM,
> > 1);
> > +               if (IS_ERR(handle)) {
> > +                       ret = PTR_ERR(handle);
> > +                       goto out_return;
> > +               }
> > +               err = ext4_journal_get_write_access(handle, sb,
> > gd_bh,
> > +                                                   EXT4_JTR_NONE);
> > +               if (err) {
> > +                       ret = err;
> > +                       goto out_journal;
> > +               }
> > +               ext4_lock_group(sb, group);
> > +               gdp->bg_flags |= cpu_to_le16(EXT4_BG_TRIMMED);
> > +               ext4_group_desc_csum_set(sb, group, gdp);
> > +               ext4_unlock_group(sb, group);
> > +               err = ext4_handle_dirty_metadata(handle, NULL,
> > gd_bh);
> > +               if (err)
> > +                       ret = err;
> > +out_journal:
> > +               err = ext4_journal_stop(handle);
> > +               if (err)
> > +                       ret = err;
> > +       }
> > +out_return:
> >         ext4_debug("trimmed %d blocks in the group %d\n",
> >                 ret, group);
> > 
> > --
> > 2.41.0
> > 
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 





[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux