Re: [EXTERNAL] Re: What throughput is reasonable?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2019-04-13 at 13:55 +0300, David Woodhouse wrote:
> 
> Let's switch to using iperf. You can limit the sending bandwidth with
> that. If we send more than the receive side can handle, it actually
> ends up receiving less than its peak capacity.

So, while iperf is running at the optimum output, let's see what perf
says:

sudo perf record -g --pid=`pidof lt-openconnect`

  Children      Self  Command         Shared Object            Symbol
+   42.15%     0.30%  lt-openconnect  [kernel.vmlinux]         [k] entry_SYSCALL_64_after_hwframe
+   41.92%     0.42%  lt-openconnect  [kernel.vmlinux]         [k] do_syscall_64
+   36.89%     0.55%  lt-openconnect  libpthread-2.28.so       [.] __libc_send
+   32.96%    32.87%  lt-openconnect  libopenconnect.so.5.5.0  [.] aesni_cbc_sha1_enc_ssse3
+   30.31%     0.16%  lt-openconnect  [kernel.vmlinux]         [k] __x64_sys_sendto
+   30.14%     0.43%  lt-openconnect  [kernel.vmlinux]         [k] __sys_sendto
+   28.78%     0.04%  lt-openconnect  [kernel.vmlinux]         [k] sock_sendmsg
+   27.95%     1.17%  lt-openconnect  [kernel.vmlinux]         [k] udp_sendmsg
+   17.77%     0.08%  lt-openconnect  [kernel.vmlinux]         [k] udp_send_skb.isra.50
+   17.60%     0.01%  lt-openconnect  [kernel.vmlinux]         [k] ip_send_skb
+   16.34%     0.26%  lt-openconnect  [kernel.vmlinux]         [k] ip_output
+   15.10%     0.48%  lt-openconnect  [kernel.vmlinux]         [k] ip_finish_output2
+   14.57%     0.44%  lt-openconnect  [kernel.vmlinux]         [k] __dev_queue_xmit
+   13.31%     0.10%  lt-openconnect  [kernel.vmlinux]         [k] sch_direct_xmit
+   10.35%     0.21%  lt-openconnect  libpthread-2.28.so       [.] __libc_read
+    8.76%     0.23%  lt-openconnect  [kernel.vmlinux]         [k] dev_hard_start_xmit
+    7.78%     0.18%  lt-openconnect  [kernel.vmlinux]         [k] ip_make_skb
+    6.82%     0.11%  lt-openconnect  [kernel.vmlinux]         [k] ksys_read
+    6.25%     0.18%  lt-openconnect  [kernel.vmlinux]         [k] vfs_read
+    6.24%     6.21%  lt-openconnect  libopenconnect.so.5.5.0  [.] sha1_block_data_order_ssse3
+    5.70%     0.82%  lt-openconnect  [kernel.vmlinux]         [k] __ip_append_data.isra.52
+    5.56%     0.00%  lt-openconnect  [unknown]                [k] 0000000000000000
+    5.54%     5.54%  lt-openconnect  [kernel.vmlinux]         [k] entry_SYSCALL_64
+    5.33%     0.15%  lt-openconnect  [kernel.vmlinux]         [k] __vfs_read
+    5.15%     0.52%  lt-openconnect  [tun]                    [k] tun_chr_read_iter
+    5.13%     2.84%  lt-openconnect  [ena]                    [k] ena_start_xmit

.. and without the '-g':

Overhead  Command         Shared Object            Symbol
  32.94%  lt-openconnect  libopenconnect.so.5.5.0  [.] aesni_cbc_sha1_enc_ssse3
   5.77%  lt-openconnect  libopenconnect.so.5.5.0  [.] sha1_block_data_order_ssse3
   4.70%  lt-openconnect  [kernel.vmlinux]         [k] _raw_spin_lock
   3.44%  lt-openconnect  [kernel.vmlinux]         [k] entry_SYSCALL_64
   2.86%  lt-openconnect  [kernel.vmlinux]         [k] syscall_return_via_sysret
   2.81%  lt-openconnect  [kernel.vmlinux]         [k] copy_user_enhanced_fast_string
   2.66%  lt-openconnect  [kernel.vmlinux]         [k] irq_entries_start
   1.86%  lt-openconnect  libopenconnect.so.5.5.0  [.] aesni_cbc_encrypt
   1.44%  lt-openconnect  [kernel.vmlinux]         [k] pvclock_clocksource_read
   1.39%  lt-openconnect  [kernel.vmlinux]         [k] native_apic_msr_eoi_write
   1.34%  lt-openconnect  [ena]                    [k] ena_io_poll
   1.33%  lt-openconnect  [ena]                    [k] ena_start_xmit
   1.12%  lt-openconnect  [kernel.vmlinux]         [k] __fget_light
   1.02%  lt-openconnect  [kernel.vmlinux]         [k] common_interrupt
   0.88%  lt-openconnect  [kernel.vmlinux]         [k] interrupt_entry
   0.73%  lt-openconnect  [kernel.vmlinux]         [k] packet_rcv
   0.71%  lt-openconnect  [tun]                    [k] tun_do_read
   0.66%  lt-openconnect  [kernel.vmlinux]         [k] udp_sendmsg
   0.66%  lt-openconnect  [kernel.vmlinux]         [k] __slab_free
   0.61%  lt-openconnect  [kernel.vmlinux]         [k] ipt_do_table
   0.61%  lt-openconnect  [kernel.vmlinux]         [k] ipv4_mtu
   0.60%  lt-openconnect  [kernel.vmlinux]         [k] sock_wfree
   0.58%  lt-openconnect  [kernel.vmlinux]         [k] kfree
   0.58%  lt-openconnect  [tun]                    [k] tun_chr_read_iter


Expanding (a rerun of) the first one to see where all that syscall time
is, it's mostly on the UDP send side:

  Children      Self  Command         Shared Object            Symbol
-   38.15%     0.29%  lt-openconnect  [kernel.vmlinux]         [k] entry_SYSCALL_64_after_hwframe                ▒
     37.86% entry_SYSCALL_64_after_hwframe                                                                       ▒
      - do_syscall_64                                                                                            ◆
         - 28.92% __x64_sys_sendto                                                                               ▒
            - 28.68% __sys_sendto                                                                                ▒
               - 27.05% sock_sendmsg                                                                             ▒
                  - 26.44% udp_sendmsg                                                                           ▒
                     - 17.62% udp_send_skb.isra.50                                                               ▒
                        - 17.46% ip_send_skb                                                                     ▒
                           - 15.92% ip_output                                                                    ▒
                              - 14.29% ip_finish_output2                                                         ▒
                                 - 12.85% __dev_queue_xmit                                                       ▒
                                    - 11.31% sch_direct_xmit                                                     ▒
                                       + 5.92% dev_hard_start_xmit                                               ▒
                                         4.36% _raw_spin_lock                                                    ▒
                                       + 0.84% validate_xmit_skb_list                                            ▒
                                 + 0.76% __local_bh_enable_ip                                                    ▒
                                0.73% ip_finish_output                                                           ▒
                                0.55% nf_hook_slow                                                               ▒
                           + 1.54% ip_local_out                                                                  ▒
                     + 7.46% ip_make_skb                                                                         ▒
                       0.77% sk_dst_check                                                                        ▒
                  + 0.54% security_socket_sendmsg                                                                ▒
               + 1.05% sockfd_lookup_light                                                                       ▒
         - 7.51% ksys_read                                                                                       ▒
            + 6.76% vfs_read                                                                                     ▒
            + 0.63% __fdget_pos                                                                                  ▒
         + 0.55% common_interrupt                                                                                ▒
+   37.92%     0.38%  lt-openconnect  [kernel.vmlinux]         [k] do_syscall_64                                 ▒
+   37.66%    32.36%  lt-openconnect  libopenconnect.so.5.5.0  [.] aesni_cbc_sha1_enc_ssse3                      ▒


So setting up zerocopy for the tun device with virtio-user might not
win us much. Maybe MSG_ZEROCOPY for the UDP could help? Not quite sure
where in the above the copy from userspace is actually happening.

But these are my results, not yours. And frankly, I'm not worried about
the performance on *my* system. 1800Mb/s will do me quite nicely for
now, thank you very much.

Let's see what you get on your side for the comparable traces. Start
the 'record' right after starting iperf in a different terminal, then
stop it just before iperf is about to finish, ~10 seconds later.

(oops, I see sha1_block_data_order_ssse3 in the trace. It's not
'detecting' AVX support. Fix that and my ESP microbenchmark is now at
2775Mb/s, although the overall perf traces look similar. 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
openconnect-devel mailing list
openconnect-devel@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/openconnect-devel

[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux