[Problem] Corosync cannot reconstitute a cluster.

renayama19661014@xxxxxxxxx · Fri, 31 May 2013 14:12:11 +0900 (JST)

Hi All,

We discovered the problem of the network of the corosync communication.

We composed a cluster of three nodes on KVM in corosync.

Step 1) Start corosync service in all nodes. 

Step 2) Confirm that a cluster is comprised of all nodes definitely and became the OPERATIONAL state.

Step 3) Cut off the network of node1(rh64-coro1) and node2(rh64-coro2) from a host of KVM.

       [root@kvm-host ~]# brctl delif virbr3 vnet5;brctl delif virbr2 vnet1

Step 4) Because a problem occurred, we stop all nodes.

The problem occurs at the time of step 3.

One node(rh64-coro1) continues moving a state after becoming the OPERATIONAL state.

Two nodes(rh64-coro2 and rh64-coro3) continue changing in a state.
It seems to never change in an OPERATIONAL state while the first node operates.

This means that two nodes(rh64-coro2 and rh64-coro3) cannot complete cluster constitution.
When this network trouble happens, by the setting that corosync combined with Pacemaker, corosync cannot notify Pacemaker of the constitution change of the cluster.

Question 1) Are there any parameters to solve this problem in corosync.conf?
 * We bundle up an interface(Bonding) and think that it can be settled by appointing "rrp_mode:none", but do not want to appoint "rrp_mode:none".

Question 2) Is this a bug? Or is it specifications of the communication of corosync?

I attach the log of three nodes.

Best Regards,
Hideo Yamauchi.
<<attachment: log_and_conf.zip>>
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss