Re: Bonding Interfaces: Active Load Balancing & LACP

Eric <epretorious@xxxxxxxxx> · Tue, 12 Jun 2012 20:08:58 -0700 (PDT)

A good friend explained it this way:

The problem you describe is a fairly well-known issue and there's really
 not a good fix for it.  Often, a switch will support multiple 
addressing algorithms (L2, L2_L3, L2_L3_L4, L3_L4).  All bond a flow to a
 given port for egress. This means that if you have a single data
 flow between two servers that are connected to the same switch, you are
 limited to the speed of a single uplink.

I'm
 assuming in the case of the HP Procurve 2824 that the "SA/DA (Source 
Address/Destination Address) method of distributing traffic" is really 
marketing speak for L2 hashing.

If there's an option to do L3_L4 
or L2_L3_L4, you might be slightly better off if there are multiple 
flows involved. In your case, it doesn't sound like there actually are 
multiple flows.  If it really is only one flow, you'd need a 10GbE switch and interfaces to go faster.

FYI,
 Brocade has supposedly implemented frame-spraying (true round-robin) on
 their latest switches.  They do this by the use of custom ASICs derived
 from their fibre channel switch lines, which have had frame-spraying 
for some time.

In frame-spraying (assuming a 4-port 
port-channel/loadshare), frame A goes to port 1, frame B goes to port 2,
 frame C goes to port 3, frame D goes to port 4, frame E goes to port 1,
 frame F goes to port 2, etc.

This method supposedly gives a 
fairly good traffic distribution even with small numbers of flows.  
There are still corner cases where it wouldn't work well.  It also 
doesn't fix any problems that can arise if the sending system doesn't 
implement frame-spraying (which it probably won't).

HTH,
Eric Pretorious
Truckee, CA

  From: Eric <epretorious@xxxxxxxxx>
 To: "linux-cluster@xxxxxxxxxx" <linux-cluster@xxxxxxxxxx> 
 Sent: Wednesday, June 6, 2012 9:12 PM
 Subject: [Linux-cluster] Bonding Interfaces: Active Load Balancing & LACP

I'm currently using the HP Procurve 2824 24-port Gigabit Ethernet switch to for a backside network for synchronizing file systems between the nodes in the group. Each host has 4 Gigabit NIC's and the goal is to bond two of the Gigabit NIC's together to create a 2 Gbps link from any host  to any other host but what I'm finding is that the bonded links are only capable of 1 Gbps from any host to any other host. Is it possible to 
create a multi-Gigabit link between two hosts (without having to upgrade
 to 10G) using a switch that "uses the 
SA/DA (Source Address/Destination Address) method of distributing 
traffic across the trunked links"?

 The problem, at least as far as I can tell, comes down to the 
limitation of ARP resolution (in the host) and mac-address tables (in 
the switch):

When configured to use Active Load
 Balancing, the kernel driver leaves each of the interface's MAC 
addresses unchanged. In this scenario, when Host A sends sends traffic 
to host Host B, the kernel uses the MAC address of only one of Host B's 
NIC's as the DA. When the packet arrives at the switch, the switch 
consults the mac-address table for the DA and then sends the packet to 
the interface connected to the NIC with MAC address equal to DA. Thus 
packets from Host A to Host B will only leave the switch through one 
interface - the interface connected to the NIC with MAC address equal to DA. This has the effect of limiting the throughput
 from Host A to Host B to the speed of the one interface connected to the NIC with MAC address equal to DA.
When configured to use IEEE 802.3ad (LACP),
 the kernel driver assigns the same MAC address to all of the hosts' 
interfaces. In this scenario, when Host A sends traffic to Host B, the 
kernel uses Host B's shared MAC address as the DA. When the packet 
arrives at the switch, the switch creates a hash based on the SA/DA 
pair, consults the mac-address table for the DA, and and assigns the 
flow (i.e., traffic from Host A to Host B) to one of the interfaces 
connected to Host B. Thus packets from Host A to Host B will only leave 
the switch through one interface - the interface determined by the SA/DA hash.
 This has the effect of limiting the throughput from Host A to Host B to
 the speed of the one interface determined by the hashing method. However, if the flow (from Host A to Host B's shared MAC 
address) were to be distributed across the different interfaces in a 
round-robin 
fashion (as the 
packets were leaving the switch) the throughput between the hosts would 
equal the aggregate of 
the links (IIUC).

Is this a limitation of the the Procurve's 
implementation of LACP? Do other switches use  different methods of 
distributing traffic across the trunked links? Is there another method 
of aggregating the links between the two hosts (e.g., multipathing)?

TIA,
Eric Pretorious
Truckee, CA

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster