Re: lacp bonding | working as expected..?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Consider trying some variation in source and destination IP addresses and port numbers - unless you force it, iperf3 at least tends to pick only even port numbers for the ephemeral source port, which leads to all traffic being balanced to one link.

In your example, where you see one link being used, I see an even source IP paired with an odd destination port number for both transfers, or is that a search and replace issue?

Client connecting to a.b.c.10, TCP port 5001
[  3] local a.b.c.9 port 37940 connected with a.b.c.10 port 5001
Client connecting to a.b.c.205, TCP port 5000
[  3] local a.b.c.9 port 48806 connected with a.b.c.10 port 5000

In your "got lucky" example, the second connect is also to a.b.c.10.

    -- jacob


On 06/21/2018 02:54 PM, mj wrote:
Hi,

I'm trying out bonding to improve ceph performance on our cluster. (currently in a test setup, using 1G NICs, instead of 10G)

Setup like this on the ProCurve 5412 chassis:

Procurve chassis(config)# show trunk

 Load Balancing Method:  L4-based

  Port | Name                     Type      | Group  Type ---- + ------------------------ --------- + ------ --------   D1   | Link to ceph9  - 1       10GbE-T   | Trk1   LACP D2   | Link to ceph9  - 2       10GbE-T   | Trk1   LACP D3   | Link to ceph10 - 1       10GbE-T   | Trk2   LACP D4   | Link to ceph10 - 2       10GbE-T   | Trk2   LACP

and on the ceph side:

auto bond0
iface bond0 inet manual
    slaves eth1 eth2
    bond_miimon 100
    bond_mode 802.3ad
     bond_xmit_hash_policy layer3+4

auto vmbr0
iface vmbr0 inet static
    address  a.b.c.10
    netmask  255.255.255.0
    gateway  a.b.c.1
    bridge_ports bond0
    bridge_stp off
    bridge_fd 0

Then, some testing: On ceph10 I start two iperf listeners, each listening on a different port, like:

iperf -s -B a.b.c.10 -p 5001 &
iperf -s -B a.b.c.10 -p 5000 &

Then I launch two different iperf processes on ceph9, to connect to my listeners, but to my surprise, MOST of the times only one link is used, for example:

Client connecting to a.b.c.10, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local a.b.c.9 port 37940 connected with a.b.c.10 port 5001
------------------------------------------------------------
Client connecting to a.b.c.205, TCP port 5000
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local a.b.c.9 port 48806 connected with a.b.c.10 port 5000
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   575 MBytes   482 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   554 MBytes   464 Mbits/sec

(and looking at ifconfig on the other side confirms that all traffic goes through the same port)

However, trying multiple times I noticed that every 3rd or 4th time I will get lucky, and both links WILL be used:

Client connecting to a.b.c.10, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local a.b.c.9 port 37984 connected with a.b.c.10 port 5001
------------------------------------------------------------
Client connecting to a.b.c.10, TCP port 5000
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local a.b.c.9 port 48850 connected with a.b.c.10 port 5000
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.09 GBytes   936 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   885 MBytes   742 Mbits/sec

My question is: is this level of "randomness" normal, and expected, or is there something wrong with my config/settings? Are there ways to improve the way links are chosen?

Specifically: I selected the L4 Load Balancing Method on the switch, as iI expected that it would help. And also "bond_xmit_hash_policy layer3+4" is the one I think I should be using, if I understand everything correctly...

I have 8 10GB ports available, and we will be running 4 ceph/proxmox servers, each with dual 10GB LACP bonded links.

Ideas?

MJ
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux