Re: HA bonding using ARP monitoring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Pete Wright wrote:
Hi All,
I've been noticing an issue on a couple boxen I have running Centos4.1, here is the uname -a: Linux xxx 2.6.9-11.ELsmp #1 SMP Wed Jun 8 16:59:12 CDT 2005 x86_64 x86_64 x86_64 GNU/Linux


I am defining the bonding kernel object as such in our modprobe.conf:

options bond0 mode=1 arp_interval=500 arp_ip_target=<gateway.ip>

we are bonding between two devices, and basic networking and failover is working correctly. I can disable one link and packets pass through the second interface as expected. The problem at hand is that it seems the slave device is flapping. I have checked our switches that this device connects to and do not see any errors there. The ports are not loosing link. These are the messages in /var/log/messages:

<snip - sorry for wrapping>
May 7 04:02:46 critblade204 kernel: bonding: bond0: backup interface eth1 is now down May 7 04:02:46 critblade204 kernel: bonding: bond0: backup interface eth1 is now down May 7 04:02:47 critblade204 kernel: bonding: bond0: backup interface eth1 is now up May 7 04:02:47 critblade204 kernel: bonding: bond0: backup interface eth1 is now up May 7 04:02:48 critblade204 kernel: bonding: bond0: backup interface eth1 is now down May 7 04:02:48 critblade204 kernel: bonding: bond0: backup interface eth1 is now down
</snip>

It seems that this was happening for several day's while the machine was inactive (it is a development box not in production yet). I ssh'd into the box today and it seems to have stopped flapping for the time being.

Any help would be appreciated, and if anyone needs more info or trouble shooting data I'll be more than willing to help with that.


as per a suggestion by jarmo.jarvenpaa@xxxxxxxxxxx I forced link speed via ethtool:
ETHTOOL_OPTS="speed 1000 duplex full autoneg off"

Unfortunatly this did not work. I am going to start looking at the code for the bonding drivers now. Is it possible this could be a network driver related issue (all NIC's are using the tg driver)? Up to this point I've been assuming that this is due to a bug in the bonding.ko code.

Thanks,
	Pete Wright

--
Peter Wright
Systems Administrator
Sony Pictures Imageworks
wright@xxxxxxxxxxxxxx
www.imageworks.com

-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux