On 09/30/2013 02:43 PM, Patrick
Hemmer wrote:
On 09/30/2013 01:45 PM, Patrick
Hemmer wrote:
I'm running corosync 2.3.2 on ubuntu precise. I'm playing
with a 3 node cluster, and whenever I try to start corosync
on one of the nodes, it fails to start properly.
I just do a simple start with `corosync -f`, and whenever I
try to use any of the tools, they error:
# corosync-cmapctl
Failed to initialize the cmap API. Error
CS_ERR_TRY_AGAIN
# corosync-quorumtool
Cannot initialize CMAP service
If I wait long enough (about 9 minutes or 530 seconds), it
does end up starting, and the tools work, but
corosync-quorumtool shows the only member is itself.
However if I start corosync with `strace -f corosync -f` the
tools work fine immediately upon start (though it still
doesn't show the other nodes). Smells like race condition,
but dunno where to begin.
My guess is something is wrong with your network relating to
multicast. Try using udpu mode - it is very stable now and
removes multicast from the list of things that can go wrong.
I am using udpu, see the
config :-)
I assume you have the same config on all nodes? If so, try using
ip addresses for the ring id. possibly a DNS resolution problem?
Other then that, I'm stumped
Yes, exact same config on all nodes. All hosts are present in
/etc/hosts. Also when I do a tcpdump on the other nodes, I see
traffic on port 5405 coming from the node in question.
Regards
-steve
Regards
-steve
This is the output from `corosync -f` (this node is
10.20.0.212):
notice [TOTEM ] Initializing transport (UDP/IP
Unicast).
notice [TOTEM ] Initializing transmit/receive
security (NSS) crypto: none hash: none
notice [TOTEM ] The network interface
[10.20.0.212] is now up.
notice [TOTEM ] adding new UDPU member
{10.20.0.127}
notice [TOTEM ] adding new UDPU member
{10.20.0.212}
notice [TOTEM ] adding new UDPU member
{10.20.2.124}
notice [TOTEM ] A new membership
(10.20.0.212:1122820) was formed. Members joined: 2
notice [TOTEM ] A new membership
(10.20.0.127:1122824) was formed. Members joined: 1 3
### here is where it pauses for almost 9 minutes
###
error [TOTEM ] FAILED TO RECEIVE
notice [TOTEM ] A new membership
(10.20.0.212:1122876) was formed. Members left: 1 3
notice [TOTEM ] A new membership
(10.20.0.212:1122936) was formed. Members
notice [TOTEM ] A new membership
(10.20.0.212:1123008) was formed. Members
notice [TOTEM ] A new membership
(10.20.0.212:1123064) was formed. Members
notice [TOTEM ] A new membership
(10.20.0.212:1123124) was formed. Members
notice [TOTEM ] A new membership
(10.20.0.212:1123180) was formed. Members
notice [TOTEM ] A new membership
(10.20.0.212:1123248) was formed. Members
notice [TOTEM ] A new membership
(10.20.0.127:1123256) was formed. Members joined: 1 3
This is the config (created by `pcs` utility), it's exactly
the same on all 3 nodes, and the other 2 nodes work fine:
----
totem {
version: 2
secauth: off
cluster_name: hapi-server
transport: udpu
}
nodelist {
node {
ring0_addr: i-74eb9c2f
nodeid: 1
}
node {
ring0_addr: i-a3bf0df9
nodeid: 2
}
node {
ring0_addr: i-ebcfcbb0
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
}
logging {
to_syslog: yes
}
----
-Patrick
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss
Here's some additional info from the command line utils after
waiting 9 minutes for it to come up:
# corosync-quorumtool
Quorum information
------------------
Date: Mon Sep 30 22:16:24 2013
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 2
Ring ID: 1124320
Quorate: No
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 1
Quorum: 2 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
2 1 i-a3bf0df9 (local)
# corosync-cmapctl |grep member
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0)
ip(10.20.0.127)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 15
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0)
ip(10.20.0.212)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0)
ip(10.20.2.124)
runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 15
runtime.totem.pg.mrp.srp.members.3.status (str) = joined
-Patrick
|
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss