Re: Corosync + DRBD and network glitch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for my ignorance, what do you mean with mode 0 or 2?

FG

On 1/22/2014, 3:05 PM, Digimer wrote:
I know that, recently, mode=0 and mode=2 support was added, maybe they're better?

On 22/01/14 03:02 PM, Francois Gaudreault wrote:
Well LACP is at the hypervisor level, so for Corosync, it's a standard
interface.

Active/Passive is not really an option for us, we need the 2GB
bandwidth. Any timeouts you think we can tweak?

FG

On 1/22/2014, 2:33 PM, Digimer wrote:
On 22/01/14 12:50 PM, Francois Gaudreault wrote:
Hi all,

I don't know if this has been addressed before, but I couldn't find
anything on a fast manner.

We have a corosync cluster to manage an active/passive MySQL service
with DRBD underneath. Those two servers are in fact VMs running on top
of two different XenServer hypervisors. The hypervisors are connected
with an LACP active-active link to a stacked switch.

What's happening is if we reboot a stack unit, the LACP will take some
time to flip the established sessions to the other link. This little
glitch is long enough to trigger a member lost in Corosync. You see the
rest, both nodes are master, and when network is back, DRBD
split-brains.

Is there anything we can do to tolerate such failures which last around
20 to 30sec?

Last I checked, corosync didn't support LACP. In my network/switch
failure tests, (with both corosync and drbd running), I only found
mode=1 (active/passive) to reliably survive all failure and recovery
scenarios (inc. power-cycling switches, etc).

It could be that your switch is temporarily blocking all traffic to
check STP. You might want to try disabling STP and re-running your tests.

Also, if you have fencing setup properly, you won't get a split-brain
regardless.







--
Francois Gaudreault
Architecte de Solution Cloud | Cloud Solutions Architect
fgaudreault@xxxxxxxxxxxx
514-629-6775
- - -
CloudOps
420 rue Guy
Montréal QC  H3J 1S6
www.cloudops.com
@CloudOps_

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss





[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux