On 3/30/23 11:11?AM, Linus Torvalds wrote: > On Thu, Mar 30, 2023 at 9:47?AM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> Sadly, in absolute numbers, comparing read(2) and readv(2), >> the latter takes 2.11x as long in the stock kernel, and 2.01x as long >> with the patches. So while single segment is better now than before, >> it's still waaaay slower than having to copy in a single iovec. Testing >> was run with all security mitigations off. > > What does the profile say? Iis it all in import_iovec() or what? > > I do note that we have some completely horrid "helper" functions in > the iter paths: things like "do_iter_readv_writev()" supposedly being > a common function , but then it ends up doing some small setup and > just doing a conditional on the "type" after all, so when it isn't > inlined, you have those things that don't predict well at all. > > And the iter interfaces don't have just that iterator, they have the > whole kiocb overhead too. All in the name of being generic. Most file > descriptors don't even support the simpler ".read" interface, because > they want the whole thing with IOCB_DIRECT flags etc. > > So to some degree it's unfair to compare read-vs-read_iter. The latter > has all that disgusting support for O_DIRECT and friends, and testing > with /dev/null just doesn't show that part. Oh I agree, and particularly for the "read from /dev/zero" case it's not very interesting, as it does too different things there as well. It was just more of a "gah it's potentially this bad" outburst than anything else, the numbers I did care about was readv before and after patches, not read vs readv. That said, there might be things to improve here. But that's a task for another time. perf diff of a read vs readv run below. # Event 'cycles' # # Baseline Delta Abs Shared Object Symbol # ........ ......... .................... ..................................... # +40.56% [kernel.vmlinux] [k] iov_iter_zero +12.59% [kernel.vmlinux] [k] copy_user_enhanced_fast_string 21.56% -11.10% [kernel.vmlinux] [k] entry_SYSCALL_64 +7.67% [kernel.vmlinux] [k] _copy_from_user +7.40% libc.so.6 [.] __GI___readv 3.76% -2.22% [kernel.vmlinux] [k] __fsnotify_parent +2.13% [kernel.vmlinux] [k] do_iter_read +2.02% [kernel.vmlinux] [k] do_iter_readv_writev +1.89% [kernel.vmlinux] [k] __import_iovec +1.59% [kernel.vmlinux] [k] do_readv 3.15% -1.43% [kernel.vmlinux] [k] __fget_light +1.42% [kernel.vmlinux] [k] vfs_readv +1.32% [kernel.vmlinux] [k] read_iter_zero 2.39% -1.30% [kernel.vmlinux] [k] syscall_exit_to_user_mode 1.89% -1.17% [kernel.vmlinux] [k] exit_to_user_mode_prepare 2.01% -1.10% [kernel.vmlinux] [k] do_syscall_64 2.04% -1.06% [kernel.vmlinux] [k] __fdget_pos 1.93% -0.99% [kernel.vmlinux] [k] syscall_enter_from_user_mode +0.81% [kernel.vmlinux] [k] __get_task_ioprio 1.03% -0.56% [kernel.vmlinux] [k] fpregs_assert_state_consistent 0.85% -0.49% [kernel.vmlinux] [k] syscall_exit_to_user_mode_prepare +0.45% [kernel.vmlinux] [k] import_iovec +0.20% [kernel.vmlinux] [k] kfree +0.18% [kernel.vmlinux] [k] __x64_sys_readv +0.06% read-zero [.] readv@plt +0.01% [kernel.vmlinux] [k] filemap_map_pages +0.01% ld-linux-x86-64.so.2 [.] check_match 0.00% +0.00% [kernel.vmlinux] [k] memset_erms 0.00% -0.00% [kernel.vmlinux] [k] perf_iterate_ctx +0.00% [kernel.vmlinux] [k] xfs_iunlock 0.49% -0.00% read-zero [.] main +0.00% [kernel.vmlinux] [k] arch_get_unmapped_area_topdown +0.00% [kernel.vmlinux] [k] pcpu_alloc +0.00% [kernel.vmlinux] [k] perf_event_exec +0.00% taskset [.] 0x0000000000002ebd 0.00% +0.00% [kernel.vmlinux] [k] perf_ibs_handle_irq 0.00% -0.00% [kernel.vmlinux] [k] perf_ibs_start 32.88% [kernel.vmlinux] [k] read_zero 15.22% libc.so.6 [.] read 6.27% [kernel.vmlinux] [k] vfs_read 2.60% [kernel.vmlinux] [k] ksys_read 1.02% [kernel.vmlinux] [k] __cond_resched 0.41% [kernel.vmlinux] [k] rcu_all_qs 0.35% [kernel.vmlinux] [k] __x64_sys_read 0.12% read-zero [.] read@plt 0.01% ld-linux-x86-64.so.2 [.] _dl_load_cache_lookup 0.01% ld-linux-x86-64.so.2 [.] _dl_check_map_versions 0.00% [kernel.vmlinux] [k] _find_next_or_bit 0.00% [kernel.vmlinux] [k] perf_event_update_userpage 0.00% [kernel.vmlinux] [k] native_sched_clock -- Jens Axboe