Re: Infiniband special ops?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]





On 22/01/2021 00:33, Steven Tardy wrote:
On Thu, Jan 21, 2021 at 6:34 PM lejeczek via CentOS <centos@xxxxxxxxxx <mailto:centos@xxxxxxxxxx>> wrote:

    Hi guys.

    Hoping some net experts my stumble upon this message,
    I have
    an IPoIB direct host to host connection and:

    -> $ ethtool ib1
    Settings for ib1:
         Supported ports: [  ]
         Supported link modes:   Not reported
         Supported pause frame use: No
         Supports auto-negotiation: No
         Supported FEC modes: Not reported
         Advertised link modes:  Not reported
         Advertised pause frame use: No
         Advertised auto-negotiation: No
         Advertised FEC modes: Not reported
         Speed: 40000Mb/s
         Duplex: Full
         Auto-negotiation: on
         Port: Other
         PHYAD: 255
         Transceiver: internal
         Link detected: yes

    and that's both ends, both hosts, yet:

     > $ iperf3 -c 10.5.5.97
    Connecting to host 10.5.5.97, port 5201
    [  5] local 10.5.5.49 port 56874 connected to
    10.5.5.97 port
    5201
    [ ID] Interval           Transfer     Bitrate        
    Retr Cwnd
    [  5]   0.00-1.00   sec  1.36 GBytes  11.6 Gbits/sec    0
    2.50 MBytes
    [  5]   1.00-2.00   sec  1.87 GBytes  16.0 Gbits/sec    0
    2.50 MBytes
    [  5]   2.00-3.00   sec  1.84 GBytes  15.8 Gbits/sec    0
    2.50 MBytes
    [  5]   3.00-4.00   sec  1.83 GBytes  15.7 Gbits/sec    0
    2.50 MBytes
    [  5]   4.00-5.00   sec  1.61 GBytes  13.9 Gbits/sec    0
    2.50 MBytes
    [  5]   5.00-6.00   sec  1.60 GBytes  13.8 Gbits/sec    0
    2.50 MBytes
    [  5]   6.00-7.00   sec  1.56 GBytes  13.4 Gbits/sec    0
    2.50 MBytes
    [  5]   7.00-8.00   sec  1.52 GBytes  13.1 Gbits/sec    0
    2.50 MBytes
    [  5]   8.00-9.00   sec  1.52 GBytes  13.1 Gbits/sec    0
    2.50 MBytes
    [  5]   9.00-10.00  sec  1.52 GBytes  13.1 Gbits/sec    0
    2.50 MBytes
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec  16.2 GBytes  13.9 Gbits/sec
    0             sender
    [  5]   0.00-10.00  sec  16.2 GBytes  13.9
    Gbits/sec                  receiver

    It's rather an oldish platform which hosts the link,
    PCIe is
    only 2.0 but with link of x8 that should be able to carry
    more than ~13Gbits/sec.
    Infiniband is Mellanox's ConnectX-3.

    Any thoughts on how to track the bottleneck or any
    thoughts



Care to capture (a few seconds) of the *sender* side .pcap?
Often TCP receive window is too small or packet loss is to blame or round-trip-time.
All of these would be evident in the packet capture.

If you do multiple streams with the `-P 8` flag does that increase the throughput?

Google says these endpoints are 1.5ms apart:

(2.5 megabytes) / (13 Gbps) =
1.53846154 milliseconds



Seems that the platform in overall might not be enough. That bitrate goes down even further when CPUs are fully loaded & occupied.
(I'll try to keep on investigating)

What I'm trying next is to have both ports(a dual-port card) "teamed" by NM, with runner set to broadcast. I'm leaving out "p-key" which NM sets to "default"(which is working with a "regular" IPoIP connection) RHEL's "networking guide" docs say "...create a team from two or more Wired or InfiniBand connections..." When I try to stand up such a team, master starts but slaves, both, fail with:
"...
<info>  [1611588576.8887] device (ib1): Activation: starting connection 'team1055-slave-ib1' (900d5073-366c-4a40-8c32-ac42c76f9c2e) <info>  [1611588576.8889] device (ib1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') <info>  [1611588576.8973] device (ib1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') <info>  [1611588576.9199] device (ib1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') <warn>  [1611588576.9262] device (ib1): Activation: connection 'team1055-slave-ib1' could not be enslaved <info>  [1611588576.9272] device (ib1): state change: ip-config -> failed (reason 'unknown', sys-iface-state: 'managed') <info>  [1611588576.9280] device (ib1): released from master device nm-team
<info>  [1611589045.6268] device (ib1): carrier: link connected
..."

Any suggestions also appreciated.
thanks, L
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos




[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]


  Powered by Linux