Re: [PATCH-v9 0/3] add support for lazytime mount option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Theodore Ts'o <tytso@xxxxxxx> writes:

> This is an updated version of what had originally been an
> ext4-specific patch which significantly improves performance by lazily
> writing timestamp updates (and in particular, mtime updates) to disk.
> The in-memory timestamps are always correct, but they are only written
> to disk when required for correctness.
>
> This provides a huge performance boost for ext4 due to how it handles
> journalling, but it's valuable for all file systems running on flash
> storage or drive-managed SMR disks by reducing the metadata write
> load.  So upon request, I've moved the functionality to the VFS layer.
> Once the /sbin/mount program adds support for MS_LAZYTIME, all file
> systems should be able to benefit from this optimization.
>
> There is still an ext4-specific optimization, which may be applicable
> for other file systems which store more than one inode in a block, but
> it will require file system specific code.  It is purely optional,
> however.
FYI: I'm writing xfstests for this feature.

Here is list
0) Consistency tests. Tests where we check that mtime is updated during
   update of other inode's fields
   A) for ino_field from "i_size xattr owner perm i_generation"
         update $ino_field; umount/mount; check mtime
   
1) Integrity tests. tests where we test that umount/sync/fsync force mtime
   update for inodes. umount case is quite obvious
   A)mtime_update; umount/mount; check mtime
   B)mtime_update; sync ; hwfailure-simulation; umount/mount; check mtime
   C)mtime_update; fsync ; hwfailure-simulation; umount/mount; check
   mtime
2) Check that mtime delay is actually works. This is statistical method
   which operate on many files.
   A) mtime_update; hwfailure-simulation; umount/mount; check mtime
   B) mtime_update; wait-lazytime-period; hwfailure-sim; umount/mount; check mtime
For (B) we need knob in sysfs to makes lazytime expiration time tunable (1-2 minutes)

>
> For people interested seeing how timestamp updates are held back, the
> following example commands to enable the tracepoints debugging may be
> helpful:
>
>   mount -o remount,lazytime /
>   cd /sys/kernel/debug/tracing
>   echo 1 > events/writeback/writeback_lazytime/enable
>   echo 1 > events/writeback/writeback_lazytime_iput/enable
>   echo "state & 2048" > events/writeback/writeback_dirty_inode_enqueue/filter
>   echo 1 > events/writeback/writeback_dirty_inode_enqueue/enable
>   echo 1 > events/ext4/ext4_other_inode_update_time/enable
>   cat trace_pipe
>
> You can also see how many lazytime inodes are in memory by looking in
> /sys/kernel/debug/bdi/<bdi>/stats
>
> Changes since -v8:
>   - in ext4_update_other_inodes_time() clear I_DIRTY_TIME_EXPIRED as
>     well as I_DIRTY_TIME
>   - Fixed a bug which broke writeback in some cases (introduced in -v7)
>
> Changes since -v7:
>    - Fix comment typos
>    - Clear the I_DIRTY_TIME flag if I_DIRTY_INODE gets added in
>      __mark_inode_dirty()
>    - Fix a bug accidentally introduced in -v7 which broke lazytime altogether 
>
> Changes since -v6:
>    - Add a new tracepoint writeback_dirty_inode_enqueue
>    - Move generic handling of update_time() to generic_update_time(),
>      so filesystems can more easily hook or modify update_time()
>    - The file system's dirty_inode() will now always get called with
>      I_DIRTY_TIME when the inode time is updated.   (I_DIRTY_SYNC will
>      also be set if the inode should be updated right away.)   This allows
>      file systems such as XFS to update its on-disk copy of the inode if
>      I_DIRTY_TIME is set.
>
> Changes since -v5:
>    - Tweak move_expired_inodes to handle sync() and syncfs(), and drop
>      flush_sb_dirty_time().
>    - Move logic for handling the b_dirty_time list into
>      __mark_inode_dirty().
>    - Move I_DIRTY back to its original definition, and use I_DIRTY_ALL
>      for I_DIRTY plus I_DIRTY_TIME.
>    - Fold some patches together to make the first patch easier to
>      review (and modify/update).
>    - Use the pre-existing writeback tracepoints instead of creating a new
>      fs tracepoints.
>
> Changes since -v4:
>    - Fix ext4 optimization so it does not need to increment (and more
>      problematically, decrement) the inode reference count
>    - Per Christoph's suggestion, drop support for btrfs and xfs for now,
>      issues with how btrfs and xfs handle dirty inode tracking.  We can add
>      btrfs and xfs support back later or at the end of this series if we
>      want to revisit this decision.
>    - Miscellaneous cleanups
>
> Changes since -v3:
>    - inodes with I_DIRTY_TIME set are placed on a new bdi list,
>         b_dirty_time.  This allows filesystem-level syncs to more
>         easily iterate over those inodes that need to have their
>         timestamps written to disk.
>    - dirty timestamps will be written out asynchronously on the final
>         iput, instead of when the inode gets evicted.
>    - separate the definition of the new function
>         find_active_inode_nowait() to a separate patch
>    - create separate flag masks: I_DIRTY_WB and I_DIRTY_INODE, which
>        indicate whether the inode needs to be on the write back lists,
>        or whether the inode itself is dirty, while I_DIRTY means any one
>        of the inode dirty flags are set.  This simplifies the fs
>        writeback logic which needs to test for different combinations of
>        the inode dirty flags in different places.
>
> Changes since -v2:
>    - If update_time() updates i_version, it will not use lazytime (i..e,
>        the inode will be marked dirty so the change will be persisted on to
>        disk sooner rather than later).  Yes, this eliminates the
>        benefits of lazytime if the user is experting the file system via
>        NFSv4.  Sad, but NFS's requirements seem to mandate this.
>    - Fix time wrapping bug 49 days after the system boots (on a system
>         with a 32-bit jiffies).   Use get_monotonic_boottime() instead.
>    - Clean up type warning in include/tracing/ext4.h
>    - Added explicit parenthesis for stylistic reasons    
>    - Added an is_readonly() inode operations method so btrfs doesn't
>        have to duplicate code in update_time().
>
> Changes since -v1:
>    - Added explanatory comments in update_time() regarding i_ts_dirty_days
>    - Fix type used for days_since_boot
>    - Improve SMP scalability in update_time and ext4_update_other_inodes_time
>    - Added tracepoints to help test and characterize how often and under
>          what circumstances inodes have their timestamps lazily updated
>
> Theodore Ts'o (3):
>   vfs: add support for a lazytime mount option
>   vfs: add find_inode_nowait() function
>   ext4: add optimization for the lazytime mount option
>
>  fs/ext4/inode.c                  |  70 +++++++++++++++++++++++++-
>  fs/ext4/super.c                  |  10 ++++
>  fs/fs-writeback.c                |  62 +++++++++++++++++++----
>  fs/gfs2/file.c                   |   4 +-
>  fs/inode.c                       | 106 +++++++++++++++++++++++++++++++++------
>  fs/jfs/file.c                    |   2 +-
>  fs/libfs.c                       |   2 +-
>  fs/proc_namespace.c              |   1 +
>  fs/sync.c                        |   8 +++
>  include/linux/backing-dev.h      |   1 +
>  include/linux/fs.h               |  10 ++++
>  include/trace/events/ext4.h      |  30 +++++++++++
>  include/trace/events/writeback.h |  60 +++++++++++++++++++++-
>  include/uapi/linux/fs.h          |   4 +-
>  mm/backing-dev.c                 |  10 +++-
>  15 files changed, 343 insertions(+), 37 deletions(-)
>
> -- 
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux