On Sat, 2019-04-13 at 13:55 +0300, David Woodhouse wrote: > > Let's switch to using iperf. You can limit the sending bandwidth with > that. If we send more than the receive side can handle, it actually > ends up receiving less than its peak capacity. So, while iperf is running at the optimum output, let's see what perf says: sudo perf record -g --pid=`pidof lt-openconnect` Children Self Command Shared Object Symbol + 42.15% 0.30% lt-openconnect [kernel.vmlinux] [k] entry_SYSCALL_64_after_hwframe + 41.92% 0.42% lt-openconnect [kernel.vmlinux] [k] do_syscall_64 + 36.89% 0.55% lt-openconnect libpthread-2.28.so [.] __libc_send + 32.96% 32.87% lt-openconnect libopenconnect.so.5.5.0 [.] aesni_cbc_sha1_enc_ssse3 + 30.31% 0.16% lt-openconnect [kernel.vmlinux] [k] __x64_sys_sendto + 30.14% 0.43% lt-openconnect [kernel.vmlinux] [k] __sys_sendto + 28.78% 0.04% lt-openconnect [kernel.vmlinux] [k] sock_sendmsg + 27.95% 1.17% lt-openconnect [kernel.vmlinux] [k] udp_sendmsg + 17.77% 0.08% lt-openconnect [kernel.vmlinux] [k] udp_send_skb.isra.50 + 17.60% 0.01% lt-openconnect [kernel.vmlinux] [k] ip_send_skb + 16.34% 0.26% lt-openconnect [kernel.vmlinux] [k] ip_output + 15.10% 0.48% lt-openconnect [kernel.vmlinux] [k] ip_finish_output2 + 14.57% 0.44% lt-openconnect [kernel.vmlinux] [k] __dev_queue_xmit + 13.31% 0.10% lt-openconnect [kernel.vmlinux] [k] sch_direct_xmit + 10.35% 0.21% lt-openconnect libpthread-2.28.so [.] __libc_read + 8.76% 0.23% lt-openconnect [kernel.vmlinux] [k] dev_hard_start_xmit + 7.78% 0.18% lt-openconnect [kernel.vmlinux] [k] ip_make_skb + 6.82% 0.11% lt-openconnect [kernel.vmlinux] [k] ksys_read + 6.25% 0.18% lt-openconnect [kernel.vmlinux] [k] vfs_read + 6.24% 6.21% lt-openconnect libopenconnect.so.5.5.0 [.] sha1_block_data_order_ssse3 + 5.70% 0.82% lt-openconnect [kernel.vmlinux] [k] __ip_append_data.isra.52 + 5.56% 0.00% lt-openconnect [unknown] [k] 0000000000000000 + 5.54% 5.54% lt-openconnect [kernel.vmlinux] [k] entry_SYSCALL_64 + 5.33% 0.15% lt-openconnect [kernel.vmlinux] [k] __vfs_read + 5.15% 0.52% lt-openconnect [tun] [k] tun_chr_read_iter + 5.13% 2.84% lt-openconnect [ena] [k] ena_start_xmit .. and without the '-g': Overhead Command Shared Object Symbol 32.94% lt-openconnect libopenconnect.so.5.5.0 [.] aesni_cbc_sha1_enc_ssse3 5.77% lt-openconnect libopenconnect.so.5.5.0 [.] sha1_block_data_order_ssse3 4.70% lt-openconnect [kernel.vmlinux] [k] _raw_spin_lock 3.44% lt-openconnect [kernel.vmlinux] [k] entry_SYSCALL_64 2.86% lt-openconnect [kernel.vmlinux] [k] syscall_return_via_sysret 2.81% lt-openconnect [kernel.vmlinux] [k] copy_user_enhanced_fast_string 2.66% lt-openconnect [kernel.vmlinux] [k] irq_entries_start 1.86% lt-openconnect libopenconnect.so.5.5.0 [.] aesni_cbc_encrypt 1.44% lt-openconnect [kernel.vmlinux] [k] pvclock_clocksource_read 1.39% lt-openconnect [kernel.vmlinux] [k] native_apic_msr_eoi_write 1.34% lt-openconnect [ena] [k] ena_io_poll 1.33% lt-openconnect [ena] [k] ena_start_xmit 1.12% lt-openconnect [kernel.vmlinux] [k] __fget_light 1.02% lt-openconnect [kernel.vmlinux] [k] common_interrupt 0.88% lt-openconnect [kernel.vmlinux] [k] interrupt_entry 0.73% lt-openconnect [kernel.vmlinux] [k] packet_rcv 0.71% lt-openconnect [tun] [k] tun_do_read 0.66% lt-openconnect [kernel.vmlinux] [k] udp_sendmsg 0.66% lt-openconnect [kernel.vmlinux] [k] __slab_free 0.61% lt-openconnect [kernel.vmlinux] [k] ipt_do_table 0.61% lt-openconnect [kernel.vmlinux] [k] ipv4_mtu 0.60% lt-openconnect [kernel.vmlinux] [k] sock_wfree 0.58% lt-openconnect [kernel.vmlinux] [k] kfree 0.58% lt-openconnect [tun] [k] tun_chr_read_iter Expanding (a rerun of) the first one to see where all that syscall time is, it's mostly on the UDP send side: Children Self Command Shared Object Symbol - 38.15% 0.29% lt-openconnect [kernel.vmlinux] [k] entry_SYSCALL_64_after_hwframe ▒ 37.86% entry_SYSCALL_64_after_hwframe ▒ - do_syscall_64 ◆ - 28.92% __x64_sys_sendto ▒ - 28.68% __sys_sendto ▒ - 27.05% sock_sendmsg ▒ - 26.44% udp_sendmsg ▒ - 17.62% udp_send_skb.isra.50 ▒ - 17.46% ip_send_skb ▒ - 15.92% ip_output ▒ - 14.29% ip_finish_output2 ▒ - 12.85% __dev_queue_xmit ▒ - 11.31% sch_direct_xmit ▒ + 5.92% dev_hard_start_xmit ▒ 4.36% _raw_spin_lock ▒ + 0.84% validate_xmit_skb_list ▒ + 0.76% __local_bh_enable_ip ▒ 0.73% ip_finish_output ▒ 0.55% nf_hook_slow ▒ + 1.54% ip_local_out ▒ + 7.46% ip_make_skb ▒ 0.77% sk_dst_check ▒ + 0.54% security_socket_sendmsg ▒ + 1.05% sockfd_lookup_light ▒ - 7.51% ksys_read ▒ + 6.76% vfs_read ▒ + 0.63% __fdget_pos ▒ + 0.55% common_interrupt ▒ + 37.92% 0.38% lt-openconnect [kernel.vmlinux] [k] do_syscall_64 ▒ + 37.66% 32.36% lt-openconnect libopenconnect.so.5.5.0 [.] aesni_cbc_sha1_enc_ssse3 ▒ So setting up zerocopy for the tun device with virtio-user might not win us much. Maybe MSG_ZEROCOPY for the UDP could help? Not quite sure where in the above the copy from userspace is actually happening. But these are my results, not yours. And frankly, I'm not worried about the performance on *my* system. 1800Mb/s will do me quite nicely for now, thank you very much. Let's see what you get on your side for the comparable traces. Start the 'record' right after starting iperf in a different terminal, then stop it just before iperf is about to finish, ~10 seconds later. (oops, I see sha1_block_data_order_ssse3 in the trace. It's not 'detecting' AVX support. Fix that and my ESP microbenchmark is now at 2775Mb/s, although the overall perf traces look similar.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ openconnect-devel mailing list openconnect-devel@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/openconnect-devel