No worries. Here's the definition from the kernel docs[1]:
============
mode
Specifies one of the bonding policies. The default is
balance-rr (round robin). Possible values are:
balance-rr or 0
Round-robin policy: Transmit packets in sequential
order from the first available slave through the
last. This mode provides load balancing and fault
tolerance.
active-backup or 1
Active-backup policy: Only one slave in the bond is
active. A different slave becomes active if, and only
if, the active slave fails. The bond's MAC address is
externally visible on only one port (network adapter)
to avoid confusing the switch.
In bonding version 2.6.2 or later, when a failover
occurs in active-backup mode, bonding will issue one
or more gratuitous ARPs on the newly active slave.
One gratuitous ARP is issued for the bonding master
interface and each VLAN interfaces configured above
it, provided that the interface has at least one IP
address configured. Gratuitous ARPs issued for VLAN
interfaces are tagged with the appropriate VLAN id.
This mode provides fault tolerance. The primary
option, documented below, affects the behavior of this
mode.
balance-xor or 2
XOR policy: Transmit based on the selected transmit
hash policy. The default policy is a simple [(source
MAC address XOR'd with destination MAC address) modulo
slave count]. Alternate transmit policies may be
selected via the xmit_hash_policy option, described
below.
This mode provides load balancing and fault tolerance.
============
1. https://www.kernel.org/doc/Documentation/networking/bonding.txt
On 22/01/14 03:20 PM, Francois Gaudreault wrote:
Sorry for my ignorance, what do you mean with mode 0 or 2?
FG
On 1/22/2014, 3:05 PM, Digimer wrote:
I know that, recently, mode=0 and mode=2 support was added, maybe
they're better?
On 22/01/14 03:02 PM, Francois Gaudreault wrote:
Well LACP is at the hypervisor level, so for Corosync, it's a standard
interface.
Active/Passive is not really an option for us, we need the 2GB
bandwidth. Any timeouts you think we can tweak?
FG
On 1/22/2014, 2:33 PM, Digimer wrote:
On 22/01/14 12:50 PM, Francois Gaudreault wrote:
Hi all,
I don't know if this has been addressed before, but I couldn't find
anything on a fast manner.
We have a corosync cluster to manage an active/passive MySQL service
with DRBD underneath. Those two servers are in fact VMs running on top
of two different XenServer hypervisors. The hypervisors are connected
with an LACP active-active link to a stacked switch.
What's happening is if we reboot a stack unit, the LACP will take some
time to flip the established sessions to the other link. This little
glitch is long enough to trigger a member lost in Corosync. You see
the
rest, both nodes are master, and when network is back, DRBD
split-brains.
Is there anything we can do to tolerate such failures which last
around
20 to 30sec?
Last I checked, corosync didn't support LACP. In my network/switch
failure tests, (with both corosync and drbd running), I only found
mode=1 (active/passive) to reliably survive all failure and recovery
scenarios (inc. power-cycling switches, etc).
It could be that your switch is temporarily blocking all traffic to
check STP. You might want to try disabling STP and re-running your
tests.
Also, if you have fencing setup properly, you won't get a split-brain
regardless.
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss