RE: Is SCTP throughput really this low compared to TCP?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This particular branch of the email thread(s) is now defunct: as per Daniel Borkmann the offending code was narrowed down to the following patch:

git revert ef2820a735f74ea60335f8ba3801b844f0cb184d

With this patch reverted, 3.14.0 SCTP+IPv4 performance is back to normal and working properly.

See other branch of this thread for details.



-----Original Message-----
From: Butler, Peter 
Sent: April-14-14 11:50 AM
To: Vlad Yasevich; Daniel Borkmann
Cc: linux-sctp@xxxxxxxxxxxxxxx
Subject: RE: Is SCTP throughput really this low compared to TCP?

Here are some perf numbers.  Note that these were obtained with operf/opreport.  Only the top 20 or so lines are shown here.

Identical load test performed on 3.4.2 and 3.14.

3.4.2:

CPU: Intel Core/i7, speed 2.134e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  cumulative samples  %        cumulative %     linenr info                 image name               symbol name
130002   130002         4.9435   4.9435    copy_user_64.S:240          vmlinux                  copy_user_generic_string
125955   255957         4.7896   9.7330    memcpy_64.S:59              vmlinux                  memcpy
71026    326983         2.7008  12.4339    spinlock.c:136              vmlinux                  _raw_spin_lock
57138    384121         2.1727  14.6066    slub.c:409                  vmlinux                  cmpxchg_double_slab
56559    440680         2.1507  16.7573    slub.c:2208                 vmlinux                  __slab_alloc
53058    493738         2.0176  18.7749    ixgbe_main.c:2952           ixgbe.ko                 ixgbe_poll
51541    545279         1.9599  20.7348    slub.c:2439                 vmlinux                  __slab_free
49916    595195         1.8981  22.6329    ip_tables.c:294             vmlinux                  ipt_do_table
42406    637601         1.6125  24.2455    ixgbe_main.c:7824           ixgbe.ko                 ixgbe_xmit_frame_ring
40929    678530         1.5564  25.8018    slub.c:3463                 vmlinux                  kfree
40349    718879         1.5343  27.3361    core.c:132                  vmlinux                  nf_iterate
35521    754400         1.3507  28.6869    output.c:347                sctp.ko                  sctp_packet_transmit
34071    788471         1.2956  29.9825    outqueue.c:1342             sctp.ko                  sctp_check_transmitted
33962    822433         1.2914  31.2739    slub.c:2601                 vmlinux                  kmem_cache_free
33450    855883         1.2720  32.5459    outqueue.c:735              sctp.ko                  sctp_outq_flush
33005    888888         1.2551  33.8009    skbuff.c:172                vmlinux                  __alloc_skb
29231    918119         1.1115  34.9125    socket.c:1565               sctp.ko                  sctp_sendmsg
27950    946069         1.0628  35.9753    (no location information)   libc-2.14.90.so          __memmove_ssse3_back
26718    972787         1.0160  36.9913    (no location information)   nf_conntrack.ko          nf_conntrack_in
26589    999376         1.0111  38.0023    slub.c:4049                 vmlinux                  __kmalloc_node_track_caller
26449    1025825        1.0058  39.0081    slub.c:2375                 vmlinux                  kmem_cache_alloc
26211    1052036        0.9967  40.0048    sm_sideeffect.c:1074        sctp.ko                  sctp_do_sm
25527    1077563        0.9707  40.9755    slub.c:2404                 vmlinux                  kmem_cache_alloc_node
23970    1101533        0.9115  41.8870    (no location information)   libc-2.14.90.so          _int_free
23266    1124799        0.8847  42.7717    memset_64.S:62              vmlinux                  memset
22976    1147775        0.8737  43.6454    (no location information)   nf_conntrack.ko          hash_conntrack_raw
21855    1169630        0.8311  44.4764    chunk.c:175                 sctp.ko                  sctp_datamsg_from_user
21730    1191360        0.8263  45.3027    list_debug.c:24             vmlinux                  __list_add
21252    1212612        0.8081  46.1109    dev.c:3151                  vmlinux                  __netif_receive_skb
20742    1233354        0.7887  46.8996    (no location information)   libc-2.14.90.so          _int_malloc
19955    1253309        0.7588  47.6584    input.c:130                 sctp.ko                  sctp_rcv



3.14:

CPU: Intel Core/i7, speed 2.134e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  cumulative samples  %        cumulative %     linenr info                 image name               symbol name
168021   168021         6.1446   6.1446    copy_user_64.S:183          vmlinux-3.14.0           copy_user_generic_string
85199    253220         3.1158   9.2604    memcpy_64.S:59              vmlinux-3.14.0           memcpy
80133    333353         2.9305  12.1909    spinlock.c:174              vmlinux-3.14.0           _raw_spin_lock_bh
74086    407439         2.7094  14.9003    spinlock.c:150              vmlinux-3.14.0           _raw_spin_lock
51878    459317         1.8972  16.7975    ixgbe_main.c:6930           ixgbe.ko                 ixgbe_xmit_frame_ring
49354    508671         1.8049  18.6024    slub.c:2538                 vmlinux-3.14.0           __slab_free
39103    547774         1.4300  20.0324    outqueue.c:706              sctp.ko                  sctp_outq_flush
37775    585549         1.3815  21.4139    outqueue.c:1304             sctp.ko                  sctp_check_transmitted
37514    623063         1.3719  22.7858    output.c:380                sctp.ko                  sctp_packet_transmit
37320    660383         1.3648  24.1506    slub.c:2700                 vmlinux-3.14.0           kmem_cache_free
36147    696530         1.3219  25.4725    ip_tables.c:294             vmlinux-3.14.0           ipt_do_table
35494    732024         1.2980  26.7705    sm_sideeffect.c:1100        sctp.ko                  sctp_do_sm
35452    767476         1.2965  28.0670    core.c:135                  vmlinux-3.14.0           nf_iterate
34697    802173         1.2689  29.3359    slub.c:2281                 vmlinux-3.14.0           __slab_alloc
33890    836063         1.2394  30.5753    slub.c:415                  vmlinux-3.14.0           cmpxchg_double_slab
33566    869629         1.2275  31.8028    (no location information)   libc-2.14.90.so          _int_free
33228    902857         1.2152  33.0180    socket.c:1590               sctp.ko                  sctp_sendmsg
32774    935631         1.1986  34.2166    slub.c:3381                 vmlinux-3.14.0           kfree
30359    965990         1.1102  35.3268    (no location information)   libc-2.14.90.so          __memmove_ssse3_back
28905    994895         1.0571  36.3839    list_debug.c:25             vmlinux-3.14.0           __list_add
25888    1020783        0.9467  37.3306    skbuff.c:199                vmlinux-3.14.0           __alloc_skb
25490    1046273        0.9322  38.2628    fib_trie.c:1399             vmlinux-3.14.0           fib_table_lookup
25232    1071505        0.9227  39.1856    nf_conntrack_core.c:376     nf_conntrack.ko          __nf_conntrack_find_get
24114    1095619        0.8819  40.0674    chunk.c:168                 sctp.ko                  sctp_datamsg_from_user
24067    1119686        0.8801  40.9476    ixgbe_main.c:2020           ixgbe.ko                 ixgbe_clean_rx_irq
23972    1143658        0.8767  41.8242    (no location information)   libc-2.14.90.so          _int_malloc
22117    1165775        0.8088  42.6331    ip_output.c:215             vmlinux-3.14.0           ip_finish_output
22037    1187812        0.8059  43.4390    slub.c:3854                 vmlinux-3.14.0           __kmalloc_node_track_caller
21847    1209659        0.7990  44.2379    dev.c:2546                  vmlinux-3.14.0           dev_hard_start_xmit
21564    1231223        0.7886  45.0265    slub.c:2481                 vmlinux-3.14.0           kmem_cache_alloc
20659    1251882        0.7555  45.7820    socket.c:2049               sctp.ko                  sctp_recvmsg







-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@xxxxxxxxx]
Sent: April-11-14 11:28 AM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@xxxxxxxxxxxxxxx
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
> enabled.  (For what it's worth, the checksum offload gives about a 20% 
> throughput gain - but this is, of course, already included in the 
> numbers I posted to this thread as I've been using the CRC offload all
> along.)
> 
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages.  With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'.   (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size.  MTU is 1500.)
> 
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
> 
> Does this value (i.e. 40-70%) sound reasonable? 

This still looks high.  Could you run 'perf record -a' and 'perf report'
to see where we are spending all of our time in sctp.

My guess is that a lot of it is going to be in memcpy(), but I am curious.

> Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
> 
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
> 

That's interesting.  I'll have to look at see what might have changed here.

-vlad

> 
> 
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@xxxxxxxxxx]
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@xxxxxxxxxxxxxxx
> Subject: Re: Is SCTP throughput really this low compared to TCP?
> 
> Hi Peter,
> 
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable 
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try 
>> to do tso on the transmit size, thus coalescing you 1000-byte 
>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which 
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte 
>> MTU nic.
>>
>> I would be interested to see the results.  There could very well be issues.
> 
> Agreed.
> 
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
> 
>> -vlad

--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux