Re: Expense of read_iter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm working with Mingkai on optimizations for Ext4-dax.
We think that optmizing the read-iter method cannot achieve the
same performance as the read method for Ext4-dax. 
We tried Mikulas's benchmark on Ext4-dax. The overall time and perf
results are listed below:

Overall time of 2^26 4KB read.

Method       Time
read         26.782s
read-iter    36.477s

Perf result, using the read_iter method:

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles'
# Event count (approx.): 13379476464
#
# Overhead  Command  Shared Object     Symbol                                 
# ........  .......  ................  .......................................
#
    20.09%  pread    [kernel.vmlinux]  [k] copy_user_generic_string
     6.58%  pread    [kernel.vmlinux]  [k] iomap_apply
     6.01%  pread    [kernel.vmlinux]  [k] syscall_return_via_sysret
     4.85%  pread    libc-2.31.so      [.] __libc_pread
     3.61%  pread    [kernel.vmlinux]  [k] entry_SYSCALL_64_after_hwframe
     3.25%  pread    [kernel.vmlinux]  [k] _raw_read_lock
     2.80%  pread    [kernel.vmlinux]  [k] entry_SYSCALL_64
     2.71%  pread    [ext4]            [k] ext4_es_lookup_extent
     2.71%  pread    [kernel.vmlinux]  [k] __fsnotify_parent
     2.63%  pread    [kernel.vmlinux]  [k] __srcu_read_unlock
     2.55%  pread    [kernel.vmlinux]  [k] new_sync_read
     2.39%  pread    [ext4]            [k] ext4_iomap_begin
     2.38%  pread    [kernel.vmlinux]  [k] vfs_read
     2.30%  pread    [kernel.vmlinux]  [k] dax_iomap_actor
     2.30%  pread    [kernel.vmlinux]  [k] __srcu_read_lock
     2.14%  pread    [ext4]            [k] ext4_inode_block_valid
     1.97%  pread    [kernel.vmlinux]  [k] _copy_mc_to_iter
     1.97%  pread    [ext4]            [k] ext4_map_blocks
     1.89%  pread    [kernel.vmlinux]  [k] down_read
     1.89%  pread    [kernel.vmlinux]  [k] up_read
     1.65%  pread    [ext4]            [k] ext4_file_read_iter
     1.48%  pread    [kernel.vmlinux]  [k] dax_iomap_rw
     1.48%  pread    [jbd2]            [k] jbd2_transaction_committed
     1.15%  pread    [nd_pmem]         [k] __pmem_direct_access
     1.15%  pread    [kernel.vmlinux]  [k] ksys_pread64
     1.15%  pread    [kernel.vmlinux]  [k] __fget_light
     1.15%  pread    [ext4]            [k] ext4_set_iomap
     1.07%  pread    [kernel.vmlinux]  [k] atime_needs_update
     0.82%  pread    pread             [.] main
     0.82%  pread    [kernel.vmlinux]  [k] do_syscall_64
     0.74%  pread    [kernel.vmlinux]  [k] entry_SYSCALL_64_safe_stack
     0.66%  pread    [kernel.vmlinux]  [k] __x86_indirect_thunk_rax
     0.66%  pread    [nd_pmem]         [k] 0x00000000000001d0
     0.59%  pread    [kernel.vmlinux]  [k] dax_direct_access
     0.58%  pread    [nd_pmem]         [k] 0x00000000000001de
     0.58%  pread    [kernel.vmlinux]  [k] bdev_dax_pgoff
     0.49%  pread    [kernel.vmlinux]  [k] syscall_enter_from_user_mode
     0.49%  pread    [kernel.vmlinux]  [k] exit_to_user_mode_prepare
     0.49%  pread    [kernel.vmlinux]  [k] syscall_exit_to_user_mode
     0.41%  pread    [kernel.vmlinux]  [k] syscall_exit_to_user_mode_prepare
     0.33%  pread    [nd_pmem]         [k] 0x0000000000001083
     0.33%  pread    [kernel.vmlinux]  [k] dax_get_private
     0.33%  pread    [kernel.vmlinux]  [k] timestamp_truncate
     0.33%  pread    [kernel.vmlinux]  [k] percpu_counter_add_batch
     0.33%  pread    [kernel.vmlinux]  [k] copyout_mc
     0.33%  pread    [ext4]            [k] __check_block_validity.constprop.80
     0.33%  pread    [kernel.vmlinux]  [k] touch_atime
     0.25%  pread    [nd_pmem]         [k] 0x000000000000107f
     0.25%  pread    [kernel.vmlinux]  [k] rw_verify_area
     0.25%  pread    [ext4]            [k] ext4_iomap_end
     0.25%  pread    [kernel.vmlinux]  [k] _cond_resched
     0.25%  pread    [kernel.vmlinux]  [k] rcu_all_qs
     0.16%  pread    [kernel.vmlinux]  [k] __fdget
     0.16%  pread    [kernel.vmlinux]  [k] ktime_get_coarse_real_ts64
     0.16%  pread    [kernel.vmlinux]  [k] iov_iter_init
     0.16%  pread    [kernel.vmlinux]  [k] current_time
     0.16%  pread    [nd_pmem]         [k] 0x0000000000001075
     0.16%  pread    [ext4]            [k] ext4_inode_datasync_dirty
     0.16%  pread    [kernel.vmlinux]  [k] copy_mc_to_user
     0.08%  pread    pread             [.] pread@plt
     0.08%  pread    [kernel.vmlinux]  [k] __x86_indirect_thunk_r11
     0.08%  pread    [kernel.vmlinux]  [k] security_file_permission
     0.08%  pread    [kernel.vmlinux]  [k] dax_read_unlock
     0.08%  pread    [kernel.vmlinux]  [k] _raw_spin_unlock_irqrestore
     0.08%  pread    [nd_pmem]         [k] 0x000000000000108f
     0.08%  pread    [nd_pmem]         [k] 0x0000000000001095
     0.08%  pread    [kernel.vmlinux]  [k] rcu_read_unlock_strict
     0.00%  pread    [kernel.vmlinux]  [k] native_write_msr


#
# (Tip: Show current config key-value pairs: perf config --list)
#

Perf result, using the read method we added for Ext4-dax:

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles'
# Event count (approx.): 13364755903
#
# Overhead  Command  Shared Object     Symbol                                 
# ........  .......  ................  .......................................
#
    28.65%  pread    [kernel.vmlinux]  [k] copy_user_generic_string
     7.99%  pread    [ext4]            [k] ext4_dax_read
     6.50%  pread    [kernel.vmlinux]  [k] syscall_return_via_sysret
     5.43%  pread    libc-2.31.so      [.] __libc_pread
     4.45%  pread    [kernel.vmlinux]  [k] entry_SYSCALL_64
     4.20%  pread    [kernel.vmlinux]  [k] down_read
     3.38%  pread    [kernel.vmlinux]  [k] _raw_read_lock
     3.13%  pread    [ext4]            [k] ext4_es_lookup_extent
     3.05%  pread    [kernel.vmlinux]  [k] __srcu_read_lock
     2.72%  pread    [kernel.vmlinux]  [k] __fsnotify_parent
     2.55%  pread    [kernel.vmlinux]  [k] __srcu_read_unlock
     2.47%  pread    [kernel.vmlinux]  [k] vfs_read
     2.31%  pread    [kernel.vmlinux]  [k] entry_SYSCALL_64_after_hwframe
     1.89%  pread    [kernel.vmlinux]  [k] up_read
     1.73%  pread    [ext4]            [k] ext4_map_blocks
     1.65%  pread    pread             [.] main
     1.56%  pread    [kernel.vmlinux]  [k] __fget_light
     1.48%  pread    [ext4]            [k] ext4_inode_block_valid
     1.34%  pread    [kernel.vmlinux]  [k] ksys_pread64
     1.23%  pread    [kernel.vmlinux]  [k] entry_SYSCALL_64_safe_stack
     1.08%  pread    [kernel.vmlinux]  [k] syscall_exit_to_user_mode
     1.07%  pread    [nd_pmem]         [k] __pmem_direct_access
     0.99%  pread    [kernel.vmlinux]  [k] atime_needs_update
     0.91%  pread    [kernel.vmlinux]  [k] security_file_permission
     0.91%  pread    [kernel.vmlinux]  [k] syscall_enter_from_user_mode
     0.66%  pread    [kernel.vmlinux]  [k] timestamp_truncate
     0.58%  pread    [kernel.vmlinux]  [k] ktime_get_coarse_real_ts64
     0.49%  pread    pread             [.] pread@plt
     0.41%  pread    [kernel.vmlinux]  [k] current_time
     0.41%  pread    [kernel.vmlinux]  [k] dax_direct_access
     0.41%  pread    [kernel.vmlinux]  [k] do_syscall_64
     0.41%  pread    [kernel.vmlinux]  [k] exit_to_user_mode_prepare
     0.41%  pread    [kernel.vmlinux]  [k] percpu_counter_add_batch
     0.33%  pread    [kernel.vmlinux]  [k] touch_atime
     0.33%  pread    [ext4]            [k] __check_block_validity.constprop.80
     0.33%  pread    [kernel.vmlinux]  [k] copy_mc_to_user
     0.25%  pread    [kernel.vmlinux]  [k] dax_get_private
     0.25%  pread    [kernel.vmlinux]  [k] rcu_all_qs
     0.25%  pread    [nd_pmem]         [k] 0x0000000000001095
     0.16%  pread    [kernel.vmlinux]  [k] _raw_spin_lock_irqsave
     0.16%  pread    [kernel.vmlinux]  [k] syscall_exit_to_user_mode_prepare
     0.16%  pread    [nd_pmem]         [k] 0x0000000000001083
     0.16%  pread    [kernel.vmlinux]  [k] rw_verify_area
     0.16%  pread    [kernel.vmlinux]  [k] _raw_spin_unlock_irqrestore
     0.16%  pread    [kernel.vmlinux]  [k] __fdget
     0.16%  pread    [kernel.vmlinux]  [k] dax_read_lock
     0.16%  pread    [kernel.vmlinux]  [k] __x86_indirect_thunk_rax
     0.08%  pread    [kernel.vmlinux]  [k] rcu_read_unlock_strict
     0.08%  pread    [kernel.vmlinux]  [k] dax_read_unlock
     0.08%  pread    [kernel.vmlinux]  [k] update_irq_load_avg
     0.08%  pread    [nd_pmem]         [k] 0x000000000000109d
     0.08%  pread    [nd_pmem]         [k] 0x000000000000107a
     0.08%  pread    [kernel.vmlinux]  [k] __x64_sys_pread64
     0.00%  pread    [kernel.vmlinux]  [k] native_write_msr


#
# (Tip: Sample related events with: perf record -e '{cycles,instructions}:S')
#

Note that the overall time of read method is 73.42% of the read-iter method.
If we sum up the percentage of read-iter specific functions (including
ext4_file_read_iter, iomap_apply, dax_iomap_actor, _copy_mc_to_iter,
ext4_iomap_begin, jbd2_transaction_committed, new_sync_read, dax_iomap_rw,
ext4_set_iomap, ext4_iomap_end and iov_iter_init), we will get 20.81%.
In the second trace, ext4_dax_read only consumes 7.99%, which can replace
all these functions.

The overhead mainly consists of two parts. The first is constructing
struct iov_iter and iterating it (i.e., new_sync, _copy_mc_to_iter and
iov_iter_init). The second is the dax io mechanism provided by VFS (i.e.,
dax_iomap_rw, iomap_apply and ext4_iomap_begin).

There could be two approaches to optimizing: 1) implementing the read method
without the complexity of iterators and dax_iomap_rw; 2) optimizing both
iterators and how dax_iomap_rw works. Since dax_iomap_rw requires
ext4_iomap_begin, which further involves the iomap structure and others
(e.g., journaling status locks in Ext4), we think implementing the read
method would be easier.

Thanks,
Zhongwei




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux