[RFCv3 0/4] ext4: Introducing ilock wrapper APIs & fixing i_rwsem scalablity prob. in DIO mixed-rw

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



These are ilock patches which helps improve the current inode lock scalabiliy
problem in ext4 DIO mixed read/write workload case. The problem was first
reported by Joseph [1]. These patches are based upon upstream discussion
with Jan Kara & Joseph [2].

The problem really is that in case of DIO overwrites, we start with
a exclusive lock and then downgrade it later to shared lock. This causes a
scalability problem in case of mixed DIO read/write workload case. 
i.e. if we have any ongoing DIO reads and then comes a DIO writes,
(since writes starts with excl. inode lock) then it has to wait until the
shared lock is released (which only happens when DIO read is completed). 
Same is true for vice versa as well.
The same can be easily observed with perf-tools trace analysis [3].

This patch series (Patch-4) helps fix that situation even without
dioread_nolock mount opt. This is inline with the discussions with Jan [4].
More details about this are mentioned in commit msg of patch 3 & 4.

These patches are based on the top of Ted's ext4 master tree.

Patch description
=================
Patch-1: Fixes ext4_dax_read/write inode locking sequence for IOCB_NOWAIT

Patch-2: Introduces ext4_ilock/unlock APIs for use in next patches
Mainly a wrapper function for inode_lock/unlock.

Patch-3: Starts with shared iolock in case of DIO instead of exclusive iolock
This patchset helps fix the reported scalablity problem. But this Patch-3
fixes it only for dioread_nolock mount option.

Patch-4: In this we get away with dioread_nolock mount option condition
to check for shared locking. But we still take excl. lock for data=journal or
non-extent mode or non-regular file. This patch commit msg describe in
detail about why we don't need excl. lock even without dioread_nolock.

Git tree
========
https://github.com/riteshharjani/linux/tree/ext4-ilock-RFC-v3

Testing
=======
Completed xfstests -g auto with default mkfs & mount opts.
No new failures except the known one without these patches.


Performance results
===================
Collected some performance numbers for DIO sync mixed random read/write
workload w.r.t number of threads (ext4) to check for scalability.
The performance historgram shown below is the percentage change in
performance by using this ilock patchset as compared to vanilla kernel.


FIO command:
fio -name=DIO-mixed-randrw -filename=./testfile -direct=1 -iodepth=1 -thread \
-rw=randrw -ioengine=psync -bs=$bs -size=10G -numjobs=$thread \
-group_reporting=1 -runtime=120

Used fioperf tool [5] for collecting this performance scores.

Below shows the performance benefit hist with this ilock patchset in (%)
w.r.t vanilla kernel for mixed randrw workload (for 4K block size).
Notice, the percentage benefit increases with increasing number of
threads. So this patchset help achieve good scalability in the mentioned
workload. Also this gives upto ~140% perf improvement in 24 threads mixed randrw
workload with 4K burst size.
The performance difference can be even higher with high speed storage
devices, since bw speeds without the patch seems to flatten due to lock
contention problem in case of multiple threads.
[Absolute perf delta can be seen at [6]]


		Performance benefit (%) data randrw (read)-4K
		    (default mount options)
  160 +-+------+-------+--------+--------+-------+--------+-------+------+-+
      |        +       +        +        +       +        +       +        |
  140 +-+ 							   **    +-+
      |                                                    **      **      |
  120 +-+                                         **       **      **    +-+
      |                                           **       **      **      |
  100 +-+                                **       **       **      **    +-+
      |                                  **       **       **      **      |
   80 +-+                                **       **       **      **    +-+
      |                                  **       **       **      **      |
      |                          **      **       **       **      **      |
   60 +-+                        **      **       **       **      **    +-+
      |                          **      **       **       **      **      |
   40 +-+                        **      **       **       **      **    +-+
      |                 **       **      **       **       **      **      |
   20 +-+               **       **      **       **       **      **    +-+
      |                 **       **      **       **       **      **      |
    0 +-+       **      **       **      **       **       **      **    +-+
      |        +       +        +        +       +        +       +        |
  -20 +-+------+-------+--------+--------+-------+--------+-------+------+-+
               1       2        4        8      12       16      24
	       		Threads


		Performance benefit (%) data randrw (write)-4K
		     (default mount options)
  160 +-+------+-------+--------+--------+-------+--------+-------+------+-+
      |        +       +        +        +       +        +       +        |
  140 +-+ 							   **    +-+
      |                                                    **      **      |
  120 +-+                                         **       **      **    +-+
      |                                           **       **      **      |
  100 +-+                                **       **       **      **    +-+
      |                                  **       **       **      **      |
   80 +-+                                **       **       **      **    +-+
      |                                  **       **       **      **      |
      |                          **      **       **       **      **      |
   60 +-+                        **      **       **       **      **    +-+
      |                          **      **       **       **      **      |
   40 +-+                        **      **       **       **      **    +-+
      |                 **       **      **       **       **      **      |
   20 +-+               **       **      **       **       **      **    +-+
      |                 **       **      **       **       **      **      |
    0 +-+       **      **       **      **       **       **      **    +-+
      |        +       +        +        +       +        +       +        |
  -20 +-+------+-------+--------+--------+-------+--------+-------+------+-+
               1       2        4        8      12       16      24
			Threads

Previous version
================
v2: https://www.spinics.net/lists/kernel/msg3262531.html
v1: https://patchwork.ozlabs.org/cover/1163286/

References
==========
[1]: https://lore.kernel.org/linux-ext4/1566871552-60946-4-git-send-email-joseph.qi@xxxxxxxxxxxxxxxxx/
[2]: https://lore.kernel.org/linux-ext4/20190910215720.GA7561@xxxxxxxxxxxxxx/
[3]: https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/ext4/perf.report
[4]: https://patchwork.ozlabs.org/cover/1163286/
[5]: https://github.com/riteshharjani/fioperf
[6]: https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/ext4/diff_ilock_v3_default_dio_randrw_4K.txt

-ritesh


Ritesh Harjani (4):
  ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT
  ext4: Add ext4_ilock & ext4_iunlock API
  ext4: start with shared iolock in case of DIO instead of excl. iolock
  ext4: Move to shared iolock even without dioread_nolock mount opt

 fs/ext4/ext4.h    |  33 ++++++
 fs/ext4/extents.c |  16 +--
 fs/ext4/file.c    | 252 +++++++++++++++++++++++++++++++++-------------
 fs/ext4/inode.c   |   4 +-
 fs/ext4/ioctl.c   |  16 +--
 fs/ext4/super.c   |  12 +--
 fs/ext4/xattr.c   |  17 ++--
 7 files changed, 246 insertions(+), 104 deletions(-)

-- 
2.21.0




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux