Hi,
I'm trying out bonding to improve ceph performance on our cluster.
(currently in a test setup, using 1G NICs, instead of 10G)
Setup like this on the ProCurve 5412 chassis:
Procurve chassis(config)# show trunk
Load Balancing Method: L4-based
Port | Name Type | Group Type
---- + ------------------------ --------- + ------ --------
D1 | Link to ceph9 - 1 10GbE-T | Trk1 LACP
D2 | Link to ceph9 - 2 10GbE-T | Trk1 LACP
D3 | Link to ceph10 - 1 10GbE-T | Trk2 LACP
D4 | Link to ceph10 - 2 10GbE-T | Trk2 LACP
and on the ceph side:
auto bond0
iface bond0 inet manual
slaves eth1 eth2
bond_miimon 100
bond_mode 802.3ad
bond_xmit_hash_policy layer3+4
auto vmbr0
iface vmbr0 inet static
address a.b.c.10
netmask 255.255.255.0
gateway a.b.c.1
bridge_ports bond0
bridge_stp off
bridge_fd 0
Then, some testing: On ceph10 I start two iperf listeners, each
listening on a different port, like:
iperf -s -B a.b.c.10 -p 5001 &
iperf -s -B a.b.c.10 -p 5000 &
Then I launch two different iperf processes on ceph9, to connect to my
listeners, but to my surprise, MOST of the times only one link is used,
for example:
Client connecting to a.b.c.10, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local a.b.c.9 port 37940 connected with a.b.c.10 port 5001
------------------------------------------------------------
Client connecting to a.b.c.205, TCP port 5000
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local a.b.c.9 port 48806 connected with a.b.c.10 port 5000
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 575 MBytes 482 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 554 MBytes 464 Mbits/sec
(and looking at ifconfig on the other side confirms that all traffic
goes through the same port)
However, trying multiple times I noticed that every 3rd or 4th time I
will get lucky, and both links WILL be used:
Client connecting to a.b.c.10, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local a.b.c.9 port 37984 connected with a.b.c.10 port 5001
------------------------------------------------------------
Client connecting to a.b.c.10, TCP port 5000
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local a.b.c.9 port 48850 connected with a.b.c.10 port 5000
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.09 GBytes 936 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 885 MBytes 742 Mbits/sec
My question is: is this level of "randomness" normal, and expected, or
is there something wrong with my config/settings? Are there ways to
improve the way links are chosen?
Specifically: I selected the L4 Load Balancing Method on the switch, as
iI expected that it would help. And also "bond_xmit_hash_policy
layer3+4" is the one I think I should be using, if I understand
everything correctly...
I have 8 10GB ports available, and we will be running 4 ceph/proxmox
servers, each with dual 10GB LACP bonded links.
Ideas?
MJ
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com