Corosync + DRBD and network glitch

Francois Gaudreault <fgaudreault@xxxxxxxxxxxx> · Wed, 22 Jan 2014 12:50:43 -0500

Hi all,

I don't know if this has been addressed before, but I couldn't find 
anything on a fast manner.

We have a corosync cluster to manage an active/passive MySQL service 
with DRBD underneath. Those two servers are in fact VMs running on top 
of two different XenServer hypervisors. The hypervisors are connected 
with an LACP active-active link to a stacked switch.

What's happening is if we reboot a stack unit, the LACP will take some 
time to flip the established sessions to the other link. This little 
glitch is long enough to trigger a member lost in Corosync. You see the 
rest, both nodes are master, and when network is back, DRBD split-brains.

Is there anything we can do to tolerate such failures which last around 
20 to 30sec?

--
Francois Gaudreault
Architecte de Solution Cloud | Cloud Solutions Architect
fgaudreault@xxxxxxxxxxxx
514-629-6775
- - -
CloudOps
420 rue Guy
Montréal QC  H3J 1S6
www.cloudops.com
@CloudOps_

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss