Patrick,
I'm trying to get corosync running inside 2 docker containers. One of them is spewing out lots of "The token was lost in the COMMIT state." messages. The other is simply logging "The consensus timeout expired." (which given the state of the other node, is expected). Googling the commit state message turns up almost nothing, so I have no clue what it means. Both nodes are inside docker containers which each get NATed before leaving the server (using UDPU). I've taken this into consideration and have manually set the nodeid for each so that it's not based off the IP address.
NAT is problem. Basically, config file has to be in sync, what is not the case.
But you can use iptables DNAT magic to make it work. Please take your time to read thread:
http://lists.corosync.org/pipermail/discuss/2012-August/001865.html There is more in depth explanation + solution. Regards, Honza
tcpdump shows me that both nodes are receiving traffic from the other node. However the node which is throwing the 'lost in commit state' is only sending a packet every few seconds, where as the 'consensus timeout' node is sending a ton of packets. Node 1: ------------ Name: i-cd3b0393 Container IP: 172.17.0.21 (the IP corosync binds to) Server IP: 10.20.27.52 Version: 2.3.3 (Fedora 20) corosync.conf: totem { version: 2 token: 2000 token_retransmits_before_loss_const: 10 vsftype: none secauth: off transport: udpu } logging { fileline: off syslog_facility: local2 syslog_priority: debug } quorum { provider: corosync_votequorum } nodelist { node { nodeid: 1862911301 ring0_addr: i-a2542ffc } node { nodeid: 2585129852 ring0_addr: i-cd3b0393 } } /etc/hosts: 172.17.0.21 i-cd3b0393 10.20.50.204 i-a2542ffc logs: Aug 29 02:53:17 i-cd3b0393 local2.info corosync[318]: [TOTEM ] The consensus timeout expired. Aug 29 02:53:17 i-cd3b0393 local2.info corosync[318]: [TOTEM ] entering GATHER state from 3(The consensus timeout expired.). Aug 29 02:53:18 i-cd3b0393 local2.warn corosync[318]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly. Aug 29 02:53:19 i-cd3b0393 local2.warn corosync[318]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly. Aug 29 02:53:21 i-cd3b0393 local2.warn corosync[318]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly. Aug 29 02:53:21 i-cd3b0393 local2.info corosync[318]: [TOTEM ] The consensus timeout expired. Aug 29 02:53:21 i-cd3b0393 local2.info corosync[318]: [TOTEM ] entering GATHER state from 3(The consensus timeout expired.). Aug 29 02:53:22 i-cd3b0393 local2.warn corosync[318]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly. Aug 29 02:53:24 i-cd3b0393 local2.warn corosync[318]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly. Aug 29 02:53:25 i-cd3b0393 local2.warn corosync[318]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly. Aug 29 02:53:26 i-cd3b0393 local2.info corosync[318]: [TOTEM ] The consensus timeout expired. Aug 29 02:53:26 i-cd3b0393 local2.info corosync[318]: [TOTEM ] entering GATHER state from 3(The consensus timeout expired.). tcpdump: 03:03:58.846382 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP, length 163 03:03:58.896435 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP, length 163 03:03:58.945786 IP 10.20.50.204.37971 > 172.17.0.21.5405: UDP, length 163 03:03:58.946487 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP, length 163 03:03:58.996544 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP, length 163 corosync-quorumtool: Quorum information ------------------ Date: Fri Aug 29 02:57:45 2014 Quorum provider: corosync_votequorum Nodes: 1 Node ID: 2585129852 Ring ID: 2904 Quorate: No Votequorum information ---------------------- Expected votes: 2 Highest expected: 2 Total votes: 1 Quorum: 2 Activity blocked Flags: Membership information ---------------------- Nodeid Votes Name 2585129852 1 i-cd3b0393 (local) ======================================== Node 2: ------------ Name: i-a2542ffc Container IP: 172.17.0.7 (the IP corosync binds to) Server IP: 10.20.50.204 Version: 2.3.3 (Fedora 20) corosync.conf: totem { version: 2 token: 2000 token_retransmits_before_loss_const: 10 vsftype: none secauth: off transport: udpu } logging { fileline: off syslog_facility: local2 syslog_priority: debug } quorum { provider: corosync_votequorum } nodelist { node { nodeid: 1862911301 ring0_addr: i-a2542ffc } node { nodeid: 2585129852 ring0_addr: i-cd3b0393 } } /etc/hosts: 172.17.0.7 i-a2542ffc 10.20.27.52 i-cd3b0393 logs: Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]: [TOTEM ] The token was lost in the COMMIT state. Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]: [TOTEM ] entering GATHER state from 4(The token was lost in the COMMIT state.). Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]: [TOTEM ] Creating commit token because I am the rep. Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]: [TOTEM ] Storing new sequence id for ring 1b88 Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]: [TOTEM ] entering COMMIT state. Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]: [TOTEM ] The token was lost in the COMMIT state. Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]: [TOTEM ] entering GATHER state from 4(The token was lost in the COMMIT state.). Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]: [TOTEM ] Creating commit token because I am the rep. Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]: [TOTEM ] Storing new sequence id for ring 1b8c Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]: [TOTEM ] entering COMMIT state. Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]: [TOTEM ] The token was lost in the COMMIT state. Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]: [TOTEM ] entering GATHER state from 4(The token was lost in the COMMIT state.). Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]: [TOTEM ] Creating commit token because I am the rep. Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]: [TOTEM ] Storing new sequence id for ring 1b90 Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]: [TOTEM ] entering COMMIT state. tcpdump: 03:04:25.137038 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163 03:04:25.187086 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163 03:04:25.235829 IP 172.17.0.7.37971 > 10.20.27.52.5405: UDP, length 163 03:04:25.237123 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163 03:04:25.287847 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163 corosync-quorumtool: Quorum information ------------------ Date: Fri Aug 29 02:57:19 2014 Quorum provider: corosync_votequorum Nodes: 1 Node ID: 1862911301 Ring ID: 4488 Quorate: No Votequorum information ---------------------- Expected votes: 2 Highest expected: 2 Total votes: 1 Quorum: 2 Activity blocked Flags: Membership information ---------------------- Nodeid Votes Name 1862911301 1 i-a2542ffc (local) _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss
_______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss