Re: [Problem] Corosync cannot reconstitute a cluster.

Jan Friesse <jfriesse@xxxxxxxxxx> · Thu, 13 Jun 2013 07:32:06 +0200

Hideo,
thanks for your testing. I've reduced email size, because there are
important informations.

> (Result1 - OK : corosync create membership correctly)

Perfect

> (Result2  - OK : corosync create membership correctly))

Perfect

> Really? Problem is in switch?
> 
> I think that the phenomenon is generated depending on a way of the cutting of the network of corosync.
> I think that it is not a problem of SW.
> 
> The cutting of the network which I reported is as follows.
>  * x mark is cuts.
> 
>        -------------------------------
>       |                      SW1                     |
>        -------------------------------
>          |                  |                         |
>         X                  |                         |
>          |                  |                         |
>   ------------   ------------   ------------
>  | node1          |  | node2          |  | node3         |
>   ------------   ------------   ------------
>          |                  |                         |
>          |                 X                         |
>          |                  |                         |
>        -------------------------------
>       |                      SW2                     |
>        -------------------------------
> 
>  * In SW1, node3 can communicate with node2.
>  * In SW2, node3 can communicate with node1.
> 

Wait wait wait. This is totally different situation. My expectation was
ONE cable per node for ONE switch. But you are using TWO switches and
TWO cables per node. Are you using RRP? Or bonding?

> A control message of corosync goes each other, and, in the case of this trouble, does a problem not happen?
> Does it not become the factor that cannot constitute a cluster?

It really depends on technology you are using. Bonding shouldn't have
problem with such situation, because each of cables is equal and you can
lost any of them. RRP is totally different story and it will behave
incorrectly (in a way you've described), because corosync itself will:
- mark one of ring as failed
- keep going between operational/gather state

In other words, there will be membership, but very unstable.

Honza
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss