Grant Taylor <gtaylor@xxxxxxxxxxxxxxxxx> wrote: >On 07/31/07 06:01, Ralf Gross wrote: >> But I don't have an isolated network. Maybe I'm still too blind to see a >> simple solution. There really isn't a simple solution, since you're not doing something simple. It sounds simple to say you want to aggregate bandwidth from multiple interfaces for use by one TCP connection, but it's actually a pretty complicated problem to solve. The diagram and description in the bonding documentation describing the isolated network is really meant for use in clusters, and is more historical than anything else these days. In the days of yore, it was fairly cost effective to connect several switches to several systems such that each system had one port into each switch (as opposed to buying a single, much larger, switch). With no packet coalescing or the like, balance-rr would tend to deliver packets in order to the end systems (one packet per interrupt), and a given connection could get pretty close to full striped throughput. This type of arrangement breaks down with modern network hardware, since there is no longer a one-to-one relationship between interrupts and packet arrival. >The fact that you are trying to go across an aggregated link in the middle >between the two buildings where you have no control is going to hinder you >severely. Yes. You're also running up against the fact that, traditionally, Etherchannel (and equivalents) is generally meant to aggregate trunks, optimizing for overall maximum throughput across multiple connections. It's not really optimized to permit a single connection to effectively utilize the combined bandwidth of multiple links. >The only other nasty thing that comes to mind is to assign additional MAC >/ IP sets to each system on their second interfaces. Another similar Rube Goldberg sort of scheme I've set up in the past (in the lab, for bonding testing, not in a production environment, your mileage may vary, etc, etc) is to dedicate particular switch ports to particular vlans. So, e.g., linux box eth0 ---- port 1:vlan 99 SWITCH(ES) port2:vlan 99 ---- eth0 linux box bond0 eth1 ---- port 3:vlan 88 SWITCH(ES) port4:vlan 88 ---- eth1 bond0 This sort of arrangement requires setting the Cisco switch ports to be native to a particular vlan, e.g., "switchport mode access", "switchport access vlan 88". Theoretically, the intervening switches will simply pass the vlan traffic through and not decapsulate it until it reaches its end destination port. You might also have to fool with the inter-switch links to make sure they're trunking properly (to pass the vlan traffic). The downside of this sort of scheme is that the bond0 instances can only communicate with each other, unless you have the ability for one of the intermediate switches to route between the vlan and the regular network, or you have some other host also attached to the vlans to act as a gateway to the rest of the network. My switches won't route, since they're switch-only models (2960/2970/3550), with no layer 3 capability, and I've never tried setting up a separate gateway host in such a configuration. This also won't work if the intervening switches either (a) don't have higher capacity inter-switch links or (b) don't spread the traffic across the ISLs any better than they do on a regular etherchannel. Basically, you want to take the switches out of the equation (so the load balance algorithm used by etherchannel doesn't disturb the even balance of the round robin transmission). There might be other ways to essentially tunnel from port 1 to 2 and 3 to 4 (in my diagram above), but that's really what you're looking to do. Lastly, as long as I'm here, I can give my usual commentary about TCP packet reordering. The bonding balance-rr mode will generally deliver packets out of order (to an aggregated destination; if you feed a balance-rr of N links at speed X into a single link with enough capacity to handle N * X bandwidth, you don't see this problem). This is ignoring any port assignment a switch might do. TCP's action upon receiving packets out of order is typically to issue an ACK indicating a lost segment (fast retransmit; by default, after 3 segments arrive out of order). On linux, this threshold can be adjusted via the net.ipv4.tcp_reordering sysctl. Crank it up to 127 or so and the reordering effect is minimized, although there are other congestion control effects. The bottom line is that you won't ever see N * X bandwidth on a single TCP connection, and the improvement factor falls off as the number of links in the aggregate increases. With four links, you're doing pretty good to get about 2.3 links worth of throughput. If memory serves, with two links you top out around 1.5. So, the real question is: Since you've got two links, how important is that 0.5 improvement in transfer speed? Can you instead figure out a way to split your backup problem into pieces, and run them concurrently? That can be a much easier problem to tackle, given that it's trivial to add extra IP addresses to the hosts on each end, and presumably your higher end Cisco gear will permit a load-balance algorithm other than straight MAC address XOR. E.g., the 2960 I've got handy permits: slime(config)#port-channel load-balance ? dst-ip Dst IP Addr dst-mac Dst Mac Addr src-dst-ip Src XOR Dst IP Addr src-dst-mac Src XOR Dst Mac Addr src-ip Src IP Addr src-mac Src Mac Addr so it's possible to get the IP address into the port selection math, and adding IP addresses is pretty straightforward. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@xxxxxxxxxx _______________________________________________ LARTC mailing list LARTC@xxxxxxxxxxxxxxx http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc