A good friend explained it this way:
The problem you describe is a fairly well-known issue and there's really
not a good fix for it. Often, a switch will support multiple
addressing algorithms (L2, L2_L3, L2_L3_L4, L3_L4). All bond a flow to a
given port for egress. This means that if you have a single data
flow between two servers that are connected to the same switch, you are
limited to the speed of a single uplink.
I'm assuming in the case of the HP Procurve 2824 that the "SA/DA (Source Address/Destination Address) method of distributing traffic" is really marketing speak for L2 hashing.
If there's an option to do L3_L4 or L2_L3_L4, you might be slightly better off if there are multiple flows involved. In your case, it doesn't sound like there actually are multiple flows. If it really is only one flow, you'd need a 10GbE switch and interfaces to go faster.
FYI, Brocade has supposedly implemented frame-spraying (true round-robin) on their latest switches. They do this by the use of custom ASICs derived from their fibre channel switch lines, which have had frame-spraying for some time.
In frame-spraying (assuming a 4-port port-channel/loadshare), frame A goes to port 1, frame B goes to port 2, frame C goes to port 3, frame D goes to port 4, frame E goes to port 1, frame F goes to port 2, etc.
This method supposedly gives a fairly good traffic distribution even with small numbers of flows. There are still corner cases where it wouldn't work well. It also doesn't fix any problems that can arise if the sending system doesn't implement frame-spraying (which it probably won't).
HTH,
Eric Pretorious
Truckee, CA
From: Eric <epretorious@xxxxxxxxx>
To: "linux-cluster@xxxxxxxxxx" <linux-cluster@xxxxxxxxxx>
Sent: Wednesday, June 6, 2012 9:12 PM
Subject: [Linux-cluster] Bonding Interfaces: Active Load Balancing & LACP
I'm currently using the HP Procurve 2824 24-port Gigabit Ethernet switch to for a backside network for synchronizing file systems between the nodes in the group. Each host has 4 Gigabit NIC's and the goal is to bond two of the Gigabit NIC's together to create a 2 Gbps link from any host to any other host but what I'm finding is that the bonded links are only capable of 1 Gbps from any host to any other host. Is it possible to create a multi-Gigabit link between two hosts (without having to upgrade to 10G) using a switch that "uses the SA/DA (Source Address/Destination Address) method of distributing traffic across the trunked links"?The problem, at least as far as I can tell, comes down to the limitation of ARP resolution (in the host) and mac-address tables (in the switch):When configured to use Active Load Balancing, the kernel driver leaves each of the interface's MAC addresses unchanged. In this scenario, when Host A sends sends traffic to host Host B, the kernel uses the MAC address of only one of Host B's NIC's as the DA. When the packet arrives at the switch, the switch consults the mac-address table for the DA and then sends the packet to the interface connected to the NIC with MAC address equal to DA. Thus packets from Host A to Host B will only leave the switch through one interface - the interface connected to the NIC with MAC address equal to DA. This has the effect of limiting the throughput from Host A to Host B to the speed of the one interface connected to the NIC with MAC address equal to DA.When configured to use IEEE 802.3ad (LACP), the kernel driver assigns the same MAC address to all of the hosts' interfaces. In this scenario, when Host A sends traffic to Host B, the kernel uses Host B's shared MAC address as the DA. When the packet arrives at the switch, the switch creates a hash based on the SA/DA pair, consults the mac-address table for the DA, and and assigns the flow (i.e., traffic from Host A to Host B) to one of the interfaces connected to Host B. Thus packets from Host A to Host B will only leave the switch through one interface - the interface determined by the SA/DA hash. This has the effect of limiting the throughput from Host A to Host B to the speed of the one interface determined by the hashing method. However, if the flow (from Host A to Host B's shared MAC address) were to be distributed across the different interfaces in a round-robin fashion (as the packets were leaving the switch) the throughput between the hosts would equal the aggregate of the links (IIUC).
Is this a limitation of the the Procurve's implementation of LACP? Do other switches use different methods of distributing traffic across the trunked links? Is there another method of aggregating the links between the two hosts (e.g., multipathing)?
TIA,
Eric Pretorious
Truckee, CA
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster