cluster won't form - token lost in commit state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm trying to get corosync running inside 2 docker containers. One of them is spewing out lots of "The token was lost in the COMMIT state." messages. The other is simply logging "The consensus timeout expired." (which given the state of the other node, is expected).

Googling the commit state message turns up almost nothing, so I have no clue what it means.

Both nodes are inside docker containers which each get NATed before leaving the server (using UDPU). I've taken this into consideration and have manually set the nodeid for each so that it's not based off the IP address.
tcpdump shows me that both nodes are receiving traffic from the other node. However the node which is throwing the 'lost in commit state' is only sending a packet every few seconds, where as the 'consensus timeout' node is sending a ton of packets.


Node 1:
------------
Name: i-cd3b0393
Container IP: 172.17.0.21 (the IP corosync binds to)
Server IP: 10.20.27.52
Version: 2.3.3 (Fedora 20)


corosync.conf:
    totem {
      version: 2
      token: 2000
      token_retransmits_before_loss_const: 10
      vsftype: none
      secauth: off
      transport: udpu
    }

    logging {
      fileline: off
      syslog_facility: local2
      syslog_priority: debug
    }

    quorum {
      provider: corosync_votequorum
    }

    nodelist {
      node {
        nodeid: 1862911301
        ring0_addr: i-a2542ffc
      }
      node {
        nodeid: 2585129852
        ring0_addr: i-cd3b0393
      }
    }


/etc/hosts:
    172.17.0.21    i-cd3b0393
    10.20.50.204 i-a2542ffc


logs:
    Aug 29 02:53:17 i-cd3b0393 local2.info corosync[318]:  [TOTEM ] The consensus timeout expired.
    Aug 29 02:53:17 i-cd3b0393 local2.info corosync[318]:  [TOTEM ] entering GATHER state from 3(The consensus timeout expired.).
    Aug 29 02:53:18 i-cd3b0393 local2.warn corosync[318]:  [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
    Aug 29 02:53:19 i-cd3b0393 local2.warn corosync[318]:  [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
    Aug 29 02:53:21 i-cd3b0393 local2.warn corosync[318]:  [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
    Aug 29 02:53:21 i-cd3b0393 local2.info corosync[318]:  [TOTEM ] The consensus timeout expired.
    Aug 29 02:53:21 i-cd3b0393 local2.info corosync[318]:  [TOTEM ] entering GATHER state from 3(The consensus timeout expired.).
    Aug 29 02:53:22 i-cd3b0393 local2.warn corosync[318]:  [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
    Aug 29 02:53:24 i-cd3b0393 local2.warn corosync[318]:  [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
    Aug 29 02:53:25 i-cd3b0393 local2.warn corosync[318]:  [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
    Aug 29 02:53:26 i-cd3b0393 local2.info corosync[318]:  [TOTEM ] The consensus timeout expired.
    Aug 29 02:53:26 i-cd3b0393 local2.info corosync[318]:  [TOTEM ] entering GATHER state from 3(The consensus timeout expired.).


tcpdump:
    03:03:58.846382 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP, length 163
    03:03:58.896435 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP, length 163
    03:03:58.945786 IP 10.20.50.204.37971 > 172.17.0.21.5405: UDP, length 163
    03:03:58.946487 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP, length 163
    03:03:58.996544 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP, length 163


corosync-quorumtool:
    Quorum information
    ------------------
    Date:             Fri Aug 29 02:57:45 2014
    Quorum provider:  corosync_votequorum
    Nodes:            1
    Node ID:          2585129852
    Ring ID:          2904
    Quorate:          No

    Votequorum information
    ----------------------
    Expected votes:   2
    Highest expected: 2
    Total votes:      1
    Quorum:           2 Activity blocked
    Flags:            

    Membership information
    ----------------------
        Nodeid      Votes Name
    2585129852          1 i-cd3b0393 (local)


========================================

Node 2:
------------
Name: i-a2542ffc
Container IP: 172.17.0.7 (the IP corosync binds to)
Server IP: 10.20.50.204
Version: 2.3.3 (Fedora 20)


corosync.conf:
    totem {
      version: 2
      token: 2000
      token_retransmits_before_loss_const: 10
      vsftype: none
      secauth: off
      transport: udpu
    }

    logging {
      fileline: off
      syslog_facility: local2
      syslog_priority: debug
    }

    quorum {
      provider: corosync_votequorum
    }

    nodelist {
      node {
        nodeid: 1862911301
        ring0_addr: i-a2542ffc
      }
      node {
        nodeid: 2585129852
        ring0_addr: i-cd3b0393
      }
    }


/etc/hosts:
    172.17.0.7    i-a2542ffc
    10.20.27.52 i-cd3b0393


logs:
    Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]:  [TOTEM ] The token was lost in the COMMIT state.
    Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]:  [TOTEM ] entering GATHER state from 4(The token was lost in the COMMIT state.).
    Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]:  [TOTEM ] Creating commit token because I am the rep.
    Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]:  [TOTEM ] Storing new sequence id for ring 1b88
    Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]:  [TOTEM ] entering COMMIT state.
    Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]:  [TOTEM ] The token was lost in the COMMIT state.
    Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]:  [TOTEM ] entering GATHER state from 4(The token was lost in the COMMIT state.).
    Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]:  [TOTEM ] Creating commit token because I am the rep.
    Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]:  [TOTEM ] Storing new sequence id for ring 1b8c
    Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]:  [TOTEM ] entering COMMIT state.
    Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]:  [TOTEM ] The token was lost in the COMMIT state.
    Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]:  [TOTEM ] entering GATHER state from 4(The token was lost in the COMMIT state.).
    Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]:  [TOTEM ] Creating commit token because I am the rep.
    Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]:  [TOTEM ] Storing new sequence id for ring 1b90
    Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]:  [TOTEM ] entering COMMIT state.


tcpdump:
    03:04:25.137038 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163
    03:04:25.187086 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163
    03:04:25.235829 IP 172.17.0.7.37971 > 10.20.27.52.5405: UDP, length 163
    03:04:25.237123 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163
    03:04:25.287847 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163


corosync-quorumtool:
    Quorum information
    ------------------
    Date:             Fri Aug 29 02:57:19 2014
    Quorum provider:  corosync_votequorum
    Nodes:            1
    Node ID:          1862911301
    Ring ID:          4488
    Quorate:          No

    Votequorum information
    ----------------------
    Expected votes:   2
    Highest expected: 2
    Total votes:      1
    Quorum:           2 Activity blocked
    Flags:            

    Membership information
    ----------------------
        Nodeid      Votes Name
    1862911301          1 i-a2542ffc (local)

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux