Two node cluster not connected to each other

David Golovan <davidg@xxxxxxxxxxxxx> · Wed, 30 Nov 2011 15:49:11 +0200



    Hi, I have two node cluster (3.0.12) that works fine on clean start
    of both nodes, but when I fence/reboot one of the nodes it can't
    connect to the existing cluster and creates his own cluster(as far
    as I understand).

    Here is the status of both node, after second node was rebooted:

    
    [root@clit1-p ~]# clustat 

    Cluster Status for IT Cluster @ Tue Nov 29 05:19:48 2011

    Member Status: Quorate

    
     Member Name                                                    
    ID   Status

     ------ ----                                                    
    ---- ------

     clit1-p                                                            
    1 Online, Local

     clit2-p                                                            
    2 Offline

    
    [root@clit1-p ~]# corosync-quorumtool -s 

    Version:          1.2.3

    Nodes:            1

    Ring ID:          37680

    Quorum type:      quorum_cman

    Quorate:          Yes

    [root@clit1-p ~]# cman_tool status

    Version: 6.2.0

    Config Version: 17

    Cluster Name: IT Cluster

    Cluster Id: 19690

    Cluster Member: Yes

    Cluster Generation: 37680

    Membership state: Cluster-Member

    Nodes: 1

    Expected votes: 1

    Total votes: 1

    Node votes: 1

    Quorum: 1  

    Active subsystems: 7

    Flags: 2node 

    Ports Bound: 0  

    Node name: clit1-p

    Node ID: 1

    Multicast addresses: 11.0.0.1 

    Node addresses: 10.0.0.1

    
    [root@clit2-p ~]# clustat 

    Cluster Status for IT Cluster @ Tue Nov 29 05:21:36 2011

    Member Status: Quorate

    
     Member Name                                                    
    ID   Status

     ------ ----                                                    
    ---- ------

     clit1-p                                                            
    1 Offline

     clit2-p                                                            
    2 Online, Local

    
    [root@clit2-p ~]# corosync-quorumtool -s 

    Version:          1.2.3

    Nodes:            1

    Ring ID:          37848

    Quorum type:      quorum_cman

    Quorate:          Yes

    [root@clit2-p ~]# cman_tool status

    Version: 6.2.0

    Config Version: 17

    Cluster Name: IT Cluster

    Cluster Id: 19690

    Cluster Member: Yes

    Cluster Generation: 38504

    Membership state: Cluster-Member

    Nodes: 1

    Expected votes: 1

    Total votes: 1

    Node votes: 1

    Quorum: 1  

    Active subsystems: 7

    Flags: 2node 

    Ports Bound: 0  

    Node name: clit2-p

    Node ID: 2

    Multicast addresses: 11.0.0.1 

    Node addresses: 10.0.0.2

    
    Both nodes have same config file and only difference I can see is
    the Ring ID in corosync-quorumtool and Cluster Generation ID in
    cman_tool, that have to be the same numbers and same as the first
    node.

    In the log of the second node I don't see any errors except this
    messages about every 30 secs:

    Nov 27 03:31:19 clit2-p corosync[2484]:   [TOTEM ] A processor
    joined or left the membership and a new membership was formed.

    Nov 27 03:31:19 clit2-p corosync[2484]:   [CPG   ] downlist received
    left_list: 0

    Nov 27 03:31:19 clit2-p corosync[2484]:   [CPG   ] chosen downlist
    from node r(0) ip(192.168.127.85) 

    Nov 27 03:31:19 clit2-p corosync[2484]:   [MAIN  ] Completed service
    synchronization, ready to provide service.

    
    And each time it happens new Cluster Generation ID changes.

    When I restart cman on both node the Cluster is working fine
    again(both nodes in same cluster).

    
    I guess that the problem is somewhere in corosync, but I have no
    idea what can cause it and setting it's log to debug doesn't help.

    I would be glad to get some help or any direction how to find the
    problem.

    Best regards!

    
    -- 

        David Golovan
  

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss