Help with corosync and GFS2 on multi network setup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everybody,

we have been using corosync directly to provide clustering for GFS2 on our centos 7.2 pools with only one network interface and all has been working great so far!

We now have a new set-up with two network interfaces for every host in the cluster:
A -> 1 Gbit (the one we would like corosync to use, 10.220.88.X)
B -> 10 Gbit (used for iscsi connection to storage, 10.220.246.X)

when we run corosync in this mode we get the logs continuously spammed by messages like these:

[12880] cl15-02 corosyncdebug   [TOTEM ] entering GATHER state from 0(consensus timeout).
[12880] cl15-02 corosyncdebug   [TOTEM ] Creating commit token because I am the rep.
[12880] cl15-02 corosyncdebug   [TOTEM ] Saving state aru 10 high seq received 10
[12880] cl15-02 corosyncdebug   [MAIN  ] Storing new sequence id for ring 5750
[12880] cl15-02 corosyncdebug   [TOTEM ] entering COMMIT state.
[12880] cl15-02 corosyncdebug   [TOTEM ] got commit token
[12880] cl15-02 corosyncdebug   [TOTEM ] entering RECOVERY state.
[12880] cl15-02 corosyncdebug   [TOTEM ] TRANS [0] member 10.220.88.41:
[12880] cl15-02 corosyncdebug   [TOTEM ] TRANS [1] member 10.220.88.47:
[12880] cl15-02 corosyncdebug   [TOTEM ] position [0] member 10.220.88.41:
[12880] cl15-02 corosyncdebug   [TOTEM ] previous ring seq 574c rep 10.220.88.41
[12880] cl15-02 corosyncdebug   [TOTEM ] aru 10 high delivered 10 received flag 1
[12880] cl15-02 corosyncdebug   [TOTEM ] position [1] member 10.220.88.47:
[12880] cl15-02 corosyncdebug   [TOTEM ] previous ring seq 574c rep 10.220.88.41
[12880] cl15-02 corosyncdebug   [TOTEM ] aru 10 high delivered 10 received flag 1

[12880] cl15-02 corosyncdebug   [TOTEM ] Did not need to originate any messages in recovery.
[12880] cl15-02 corosyncdebug   [TOTEM ] got commit token
[12880] cl15-02 corosyncdebug   [TOTEM ] Sending initial ORF token
[12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
[12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
[12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
[12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
[12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0
[12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
[12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0
[12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
[12880] cl15-02 corosyncdebug   [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0
[12880] cl15-02 corosyncdebug   [TOTEM ] Resetting old ring state
[12880] cl15-02 corosyncdebug   [TOTEM ] recovery to regular 1-0
[12880] cl15-02 corosyncdebug   [TOTEM ] waiting_trans_ack changed to 1
Apr 11 16:19:54 [13372] cl15-02 pacemakerd:     info: pcmk_quorum_notification: Membership 22352: quorum retained (2)
Apr 11 16:19:54 [13378] cl15-02       crmd:     info: pcmk_quorum_notification: Membership 22352: quorum retained (2)
[12880] cl15-02 corosyncdebug   [TOTEM ] entering OPERATIONAL state.
[12880] cl15-02 corosyncnotice  [TOTEM ] A new membership (10.220.88.41:22352) was formed. Members
[12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync configuration map access
Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Forwarding cib_modify operation for section nodes to master (origin=local/crmd/27157)
[12880] cl15-02 corosyncdebug   [CMAP  ] Not first sync -> no action
Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Forwarding cib_modify operation for section status to master (origin=local/crmd/27158)
[12880] cl15-02 corosyncdebug   [CPG   ] got joinlist message from node 0x2
[12880] cl15-02 corosyncdebug   [CPG   ] comparing: sender r(0) ip(10.220.88.41) ; members(old:2 left:0)
[12880] cl15-02 corosyncdebug   [CPG   ] comparing: sender r(0) ip(10.220.88.47) ; members(old:2 left:0)
[12880] cl15-02 corosyncdebug   [CPG   ] chosen downlist: sender r(0) ip(10.220.88.41) ; members(old:2 left:0)
[12880] cl15-02 corosyncdebug   [CPG   ] got joinlist message from node 0x1
[12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync cluster closed process group service v1.01
Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Completed cib_modify operation for section nodes: OK (rc=0, origin=cl15-02/crmd/27157, version=0.18.22)
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[0] group:clvmd, ip:r(0) ip(10.220.88.41) , pid:35677
Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Completed cib_modify operation for section status: OK (rc=0, origin=cl15-02/crmd/27158, version=0.18.22)
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[1] group:dlm:ls:clvmd\x00, ip:r(0) ip(10.220.88.41) , pid:34995
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[2] group:dlm:controld\x00, ip:r(0) ip(10.220.88.41) , pid:34995
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[3] group:crmd\x00, ip:r(0) ip(10.220.88.41) , pid:13378
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[4] group:attrd\x00, ip:r(0) ip(10.220.88.41) , pid:13376
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[5] group:stonith-ng\x00, ip:r(0) ip(10.220.88.41) , pid:13374
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[6] group:cib\x00, ip:r(0) ip(10.220.88.41) , pid:13373
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[7] group:pacemakerd\x00, ip:r(0) ip(10.220.88.41) , pid:13372
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[8] group:crmd\x00, ip:r(0) ip(10.220.88.47) , pid:12879
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[9] group:attrd\x00, ip:r(0) ip(10.220.88.47) , pid:12877
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[10] group:stonith-ng\x00, ip:r(0) ip(10.220.88.47) , pid:12875
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[11] group:cib\x00, ip:r(0) ip(10.220.88.47) , pid:12874
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[12] group:pacemakerd\x00, ip:r(0) ip(10.220.88.47) , pid:12873
[12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
[12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[1]: votes: 1, expected: 3 flags: 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
[12880] cl15-02 corosyncdebug   [VOTEQ ] total_votes=2, expected_votes=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] node 1 state=1, votes=1, expected=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] node 2 state=1, votes=1, expected=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] node 3 state=2, votes=1, expected=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] lowest node id: 1 us: 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] highest node id: 2 us: 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0
[12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 2
[12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[2]: votes: 1, expected: 3 flags: 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
[12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 2
[12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0
[12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync vote quorum service v1.0
[12880] cl15-02 corosyncdebug   [VOTEQ ] total_votes=2, expected_votes=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] node 1 state=1, votes=1, expected=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] node 2 state=1, votes=1, expected=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] node 3 state=2, votes=1, expected=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] lowest node id: 1 us: 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] highest node id: 2 us: 1
[12880] cl15-02 corosyncnotice  [QUORUM] Members[2]: 1 2
[12880] cl15-02 corosyncdebug   [QUORUM] sending quorum notification to (nil), length = 56
[12880] cl15-02 corosyncnotice  [MAIN  ] Completed service synchronization, ready to provide service.
[12880] cl15-02 corosyncdebug   [TOTEM ] waiting_trans_ack changed to 0
[12880] cl15-02 corosyncdebug   [QUORUM] got quorate request on 0x7f5a907749a0
[12880] cl15-02 corosyncdebug   [TOTEM ] entering GATHER state from 11(merge during join).


and we do not get them when there is only a single network interface in the systems.

--------------------------------------------------------------------------------------
These are the network configurations on the three hosts:

[root@cl15-02 ~]# ifconfig | grep inet
        inet 10.220.88.41  netmask 255.255.248.0  broadcast 10.220.95.255
        inet 10.220.246.50  netmask 255.255.255.0  broadcast 10.220.246.255
        inet 127.0.0.1  netmask 255.0.0.0

[root@cl15-08 ~]# ifconfig | grep inet
        inet 10.220.88.47  netmask 255.255.248.0  broadcast 10.220.95.255
        inet 10.220.246.51  netmask 255.255.255.0  broadcast 10.220.246.255
        inet 127.0.0.1  netmask 255.0.0.0

[root@cl15-09 ~]# ifconfig | grep inet
        inet 10.220.88.48  netmask 255.255.248.0  broadcast 10.220.95.255
        inet 10.220.246.59  netmask 255.255.255.0  broadcast 10.220.246.255
        inet 127.0.0.1  netmask 255.0.0.0

-----------------------------------------------------------------------------------
corosync-quorumtool output:

[root@cl15-02 ~]# corosync-quorumtool
Quorum information
------------------
Date:             Mon Apr 11 15:46:26 2016
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          1
Ring ID:          18952
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
         1          1 cl15-02 (local)
         2          1 cl15-08
         3          1 cl15-09

---------------------------------------------------------------------------
/etc/corosync/corosync.conf:

[root@cl15-02 ~]# cat /etc/corosync/corosync.conf
totem {
    version: 2
    secauth: off
    cluster_name: gfs_cluster
    transport: udpu
}

nodelist {
    node {
        ring0_addr: cl15-02
        nodeid: 1
    }

    node {
        ring0_addr: cl15-08
        nodeid: 2
    }

    node {
        ring0_addr: cl15-09
        nodeid: 3
    }
}

quorum {
    provider: corosync_votequorum
}

logging {
    debug: on
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster



[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux