Re: One "200Mbps" virtual link between 2 ethernet adaptators of 2 linux boxes.

Jay Vosburgh <fubar@xxxxxxxxxx> · Wed, 09 Feb 2005 21:58:46 -0800

Francois <fdelawarde@xxxxxxxxxxxxxxxxx> wrote:
>              -------
>             |   A   |
>             |       |
>              -------
>              ___|__
>             |switch|
>             |______|
> -------       |   |       -------
>|   B   |eth0---   ---eth0|   C   |
>|       |eth1---------eth1|       |
> -------                   -------
>
>Machine A: (192.168.1.10) PC used to configure B&C (the only one that has a
>screen)
>Machine B&C: Very simple bonding configuration:
>
>
>modprobe bonding mode=1
>ip addr add dev bond0 192.168.1.1/24 brd +    #for B and .2 for C
>ip link set bond0 up
>ip link set eth0 up
>ip link set eth1 up
>ifenslave bond0 eth0 eth1
>
>The bad thing is: B pinging C has 50% packet lost which would mean assuming
>that the round robin of the module works that a route from one of the
>interfaces doesn't reach C (pinging from A to 192.168.1.1 gives also 50%).
>Anyone has an idea on this matter?

	First, if you set up bonding this way, check to see if the
slaves have routes that supercede the route for the bonding master
device.  The slaves should not have any routes at all, all routing
decisions are made against the master device.  When bonding is set up by
hand, the slaves can end up with routes if they are up and active prior
to being enslaved.  It's not generally a problem when bonding is set up
at boot time.

	Assuming for the moment that the routing is ok, I'm also curious
as to which link loses packets (the "eth0s with switch" or the "eth1s no
switch").  Looking at the /var/log/messages for information from the
bonding driver would also be useful; you might also look into enabling
some link monitoring (just in case).

	Lastly, trying to get a single TCP connection to, essentially,
see N interface's worth of throughput is a surprisingly difficult
problem.  This is a topic that comes up fairly regularly on the
bonding-devel list; below is an article I posted last fall.  The below
references a discussion about round robin performance as it scales up to
4 adapters from a few years ago; that was done with 100 Mb/sec hardware,
but the same would apply to gigabit links.  As somebody else pointed
out, when round robin was originally implemented in bonding, state of
the art was 10 Mb/sec, one packet per interrupt, and reordering wasn't a
problem.  Today, with adapters that coalesce packets or drivers that
implement NAPI (which does the same thing), it's very difficult to
arrange for packets to all arrive in the proper order.

	My comments below about balance-alb not allowing a single TCP
connection to see more than one interface's worth of throughput also
applies to the other balance modes in bonding (other than round robin).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@xxxxxxxxxx

To: "Shlomi Yaakobovich" <Shlomi@xxxxxxxxxx>
cc: "Tim Mattox" <tmattox@xxxxxxxxx>,
    bonding-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [Bonding-devel] bonding and appletalk 
In-Reply-To: Message from "Shlomi Yaakobovich" <Shlomi@xxxxxxxxxx> 
   of "Tue, 05 Oct 2004 14:07:39 +0200." <F8B4823728281C429F53D71695A3AA1E012729BD@xxxxxxxxxxxxxxxxxxxx> 
X-Mailer: MH-E 7.4.3; nmh 1.0.4; GNU Emacs 21.3.1
Date: Tue, 05 Oct 2004 09:59:42 -0700
From: Jay Vosburgh <fubar@xxxxxxxxxx>

Shlomi Yaakobovich <Shlomi@xxxxxxxxxx> wrote:

>Thanks for the reply, the problem was indeed that the switch's 2 ports
>were not configure to load-sharing (it's an Extreme Networks 7i
>switch). I am giving up on using mode=0 for this type of connection fow
>now, since it requires too much external support, mode=6 is easier to
>implement on a "normal" network.
>
>I suppose that mode=0 works a bit faster than mode=6, is there any
>benchmark on the difference ?  Do you guys have any idea what is the
>performance effects ?

	The summary: round-robin (mode 0) can provide a single TCP
connection with more than one interface's worth of throughput, but will
generally never let you reach the maximum throughput of the bond as a
whole, whereas balance-alb (mode 6) will never let a single TCP
connection (peer host, really) use more than one interface's worth of
throughput, but it can allow you to use the overall max throughput of
the bond (to multiple destinations).

	And, it depends on what you mean by "faster."  The round-robin
mode (mode 0) simply stripes all traffic across the interfaces,
regardless of where it's going to.  For the case of a unidirectional TCP
transfer, this will generally result in many, many packets received out
of order.  This in turn triggers TCP's congestion control algorithms
(out of order packets are interpreted as lost packets, or late packets).
This can be mitigated somewhat by adjusting tcp_reordering, but you're
not likely to see the full bandwidth utilized by one TCP connection.
This was discussed in depth on the list some time ago, see the archives
at:

http://sourceforge.net/mailarchive/forum.php?thread_id=1669977&forum_id=2094

	and look for messages titled "trunking performance."

	The tcp_reordering value, btw, can be changed via
/proc/sys/net/ipv4/tcp_reordering, or sysctl net.ipv4.tcp_reordering.
The maximum useful value is 127; the default is 3.

	The balance-alb mode, on the other hand, will stripe traffic to
different hosts across different interfaces.  Traffic to the same host
will always use the same interface (generally; the sorting may be
shuffled from time to time).  A single connection will never see more
than one interface's worth of throughput, but no segments will ever be
delivered out of order, and multiple connections can utilize pretty much
the full bandwidth of the aggregation.  The 802.3ad mode operates the
same way in this regard.

	There is no mode than will allow a single connection to use more
than one interface's worth of bandwidth and guarantee ordered delivery
of packets.  An obvious means to accomplish that would be a round robin
mode with an added reassembly layer inside of bonding.  I've not done
any experiments on something like that, so I'm not sure if the added
overhead of the reassembly would offset the gains from guaranteed
delivery order, or whether the burstiness of such a system would still
interfere with TCP's congestion control.

	That said, if you're doing high-volume UDP traffic, with no
ordering requirements, round-robin will let you slam the full
aggregate's throughput to one host, but balance-alb won't.  Read the
archives; there's a lot of analysis there.

	-J
_______________________________________________
LARTC mailing list / LARTC@xxxxxxxxxxxxxxx
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

Re: One "200Mbps" virtual link between 2 ethernet adaptators of 2 linux boxes.

Linux Advanced Routing and Traffic Control