Hello,
I am hoping that somebody here can help me troubleshoot getting "bonding
mode 4" (LACP, dynamic link aggregation, 802.3ad) working, or at least
point me to where I can get debugging information to solve the problem
myself. I'm not seeing anything particularly helpful in dmesg or in the
syslog.
Here is my situation. I have an extremely fast network storage system
capable of reading and writing over 4 GB/sec. Obviously, I can saturate
one 10 GbE Network adapter. So, I am experimenting with ways to
aggregate the bandwidth of 2 or more 10 GbE ports to one switch. Each of
my clients can use about 320 MB/sec in each direction (simultaneous
reading and writing) or 640 MB/sec just reading, and I want to connect
at least 4 clients to the switch.
I am testing both Fujitsu and Blade Network switches with 24 x 10 GbE
ports. On the server end, I have multiple Myricom 10 GbE cards, as well
as full speed dual-port Myricom cards (not the slower failover cards).
When I configure 2 ports on either switch for LACP, and then set up
Linux to use bonding mode 4, I cannot pass any data from the Linux
server to the switch. I cannot ping the switch or anything attached to
it. In the case of the Blade Networks switch, the switch logs tell me
that the Linux end does not seem to support LACP.
I do not think the problem is with bonding itself. If I set up Linux
bonding with mode=1 (round robin) or mode=6 (alb) -- and if I just leave
the 10 GbE switches unconfigured -- network traffic gets distributed to
both physical ports on the server. By the way, I am running the
2.6.32.3 kernel.org "vanilla" kernel on a Mandriva 2010 distribution. I
am setting up bonding as follows:
A) for mode 4
modprobe bonding mode=4 miimon=100
ifenslave bond0 eth0
ifenslave bond0 eth1
ifconfig bond0 {IP Address} up
B) for other modes, substituting mode=1 or mode=6
I have also tried bridging the two or more 10 GbE ports (creating a br0
device). Bridging works, as does bonding mode 1 and mode 6. The
downside of bridging, however, is that I must create multiple vlans on
the switch to force workstations to use one physical connection to the
server or the other. Mode 1 gives me very high throughput, but is
unreliable for steady realtime data flow (this is for a very high
bandwidth video application). Mode 6 seems like it might be promising,
but I believe my best bet will be mode 4. However, I cannot get mode 4
to work with either switch.
Can somebody give me some hints here? What are the best logs to look at
to see what's going on to troubleshoot? Is Linux not able to talk to a
switch configured for LACP? I'm fairly sure I have been successful
doing this before with 1 Gig ports, but this is my first time trying
with 10 Gig ports. Help and advice would be appreciated. Thank you in
advance.
Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html