On Fri 28-11-14 09:55:19, Sedat Dilek wrote: > On Fri, Nov 28, 2014 at 7:00 AM, Theodore Ts'o <tytso@xxxxxxx> wrote: > > This is an updated version of what had originally been an > > ext4-specific patch which significantly improves performance by lazily > > writing timestamp updates (and in particular, mtime updates) to disk. > > The in-memory timestamps are always correct, but they are only written > > to disk when required for correctness. > > > > This provides a huge performance boost for ext4 due to how it handles > > journalling, but it's valuable for all file systems running on flash > > storage or drive-managed SMR disks by reducing the metadata write > > load. So upon request, I've moved the functionality to the VFS layer. > > Once the /sbin/mount program adds support for MS_LAZYTIME, all file > > systems should be able to benefit from this optimization. > > > > There is still an ext4-specific optimization, which may be applicable > > for other file systems which store more than one inode in a block, but > > it will require file system specific code. It is purely optional, > > however. > > > > Please note the changes to update_time() and the new write_time() inode > > operations functions, which impact btrfs and xfs. The changes are > > fairly simple, but I would appreciate confirmation from the btrfs and > > xfs teams that I got things right. Thanks!! > > > > Some questions... on how to test this... > > [ Base ] > Is this patchset on top of ext4-next (ext4.git#dev)? Might someone > test on top of Linux v3.18-rc6 with pulled in ext4.git#dev2? > > [ Userland ] > Do I need an updated userland (/sbin/mount)? IOW, adding "lazytime" to > my ext4-line(s) in /etc/fstab is enough? > > [ Benchmarks ] > Do you have numbers - how big/fast is the benefit? On a desktop machine? Actually a benefit you may notice on a laptop machine is that disk will wake up less often. When I was looking for reasons of disk wakeup on a desktop machine, some of these were mtime updates of unix socket inodes. This patches will make them go away. Honza > > Thanks in advance. > > - Sedat - > > > Changes since -v4: > > - Fix ext4 optimization so it does not need to increment (and more > > problematically, decrement) the inode reference count > > - Per Christoph's suggestion, drop support for btrfs and xfs for now, > > issues with how btrfs and xfs handle dirty inode tracking. We can add > > btrfs and xfs support back later or at the end of this series if we > > want to revisit this decision. > > - Miscellaneous cleanups > > > > Changes since -v3: > > - inodes with I_DIRTY_TIME set are placed on a new bdi list, > > b_dirty_time. This allows filesystem-level syncs to more > > easily iterate over those inodes that need to have their > > timestamps written to disk. > > - dirty timestamps will be written out asynchronously on the final > > iput, instead of when the inode gets evicted. > > - separate the definition of the new function > > find_active_inode_nowait() to a separate patch > > - create separate flag masks: I_DIRTY_WB and I_DIRTY_INODE, which > > indicate whether the inode needs to be on the write back lists, > > or whether the inode itself is dirty, while I_DIRTY means any one > > of the inode dirty flags are set. This simplifies the fs > > writeback logic which needs to test for different combinations of > > the inode dirty flags in different places. > > > > Changes since -v2: > > - If update_time() updates i_version, it will not use lazytime (i..e, > > the inode will be marked dirty so the change will be persisted on to > > disk sooner rather than later). Yes, this eliminates the > > benefits of lazytime if the user is experting the file system via > > NFSv4. Sad, but NFS's requirements seem to mandate this. > > - Fix time wrapping bug 49 days after the system boots (on a system > > with a 32-bit jiffies). Use get_monotonic_boottime() instead. > > - Clean up type warning in include/tracing/ext4.h > > - Added explicit parenthesis for stylistic reasons > > - Added an is_readonly() inode operations method so btrfs doesn't > > have to duplicate code in update_time(). > > > > Changes since -v1: > > - Added explanatory comments in update_time() regarding i_ts_dirty_days > > - Fix type used for days_since_boot > > - Improve SMP scalability in update_time and ext4_update_other_inodes_time > > - Added tracepoints to help test and characterize how often and under > > what circumstances inodes have their timestamps lazily updated > > > > Theodore Ts'o (5): > > vfs: add support for a lazytime mount option > > vfs: don't let the dirty time inodes get more than a day stale > > vfs: add lazytime tracepoints for better debugging > > vfs: add find_inode_nowait() function > > ext4: add optimization for the lazytime mount option > > > > fs/ext4/inode.c | 66 +++++++++++++++++++++++-- > > fs/ext4/super.c | 9 ++++ > > fs/fs-writeback.c | 66 ++++++++++++++++++++++--- > > fs/inode.c | 116 +++++++++++++++++++++++++++++++++++++++++--- > > fs/libfs.c | 2 +- > > fs/logfs/readwrite.c | 2 +- > > fs/nfsd/vfs.c | 2 +- > > fs/pipe.c | 2 +- > > fs/proc_namespace.c | 1 + > > fs/sync.c | 8 +++ > > fs/ufs/truncate.c | 2 +- > > include/linux/backing-dev.h | 1 + > > include/linux/fs.h | 17 ++++++- > > include/trace/events/ext4.h | 30 ++++++++++++ > > include/trace/events/fs.h | 56 +++++++++++++++++++++ > > include/uapi/linux/fs.h | 1 + > > mm/backing-dev.c | 10 +++- > > 17 files changed, 367 insertions(+), 24 deletions(-) > > create mode 100644 include/trace/events/fs.h > > > > -- > > 2.1.0 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html