Re: Issue starting the CMAP service

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Patrick,
I'm sure it's really firwall/switch problem. Please make sure that port
and port - 1 are not blocked. For a testing purposes, you can just
disable firewall completely and see if corosync works or not.

Regards,
  Honza

Patrick Hemmer napsal(a):
> *From: *Steven Dake <sdake@xxxxxxxxxx>
> *Sent: * 2013-09-30 18:12:25 E
> *To: *Patrick Hemmer <corosync@xxxxxxxxxxxxxxx>
> *CC: *discuss@xxxxxxxxxxxx
> *Subject: *Re:  Issue starting the CMAP service
> 
>> On 09/30/2013 02:43 PM, Patrick Hemmer wrote:
>>> *From: *Steven Dake <sdake@xxxxxxxxxx>
>>> *Sent: * 2013-09-30 16:50:26 E
>>> *To: *Patrick Hemmer <corosync@xxxxxxxxxxxxxxx>
>>> *CC: *discuss@xxxxxxxxxxxx
>>> *Subject: *Re:  Issue starting the CMAP service
>>>
>>>> On 09/30/2013 01:45 PM, Patrick Hemmer wrote:
>>>>> I'm running corosync 2.3.2 on ubuntu precise. I'm playing with a 3
>>>>> node cluster, and whenever I try to start corosync on one of the
>>>>> nodes, it fails to start properly.
>>>>> I just do a simple start with `corosync -f`, and whenever I try to 
>>>>> use any of the tools, they error:
>>>>>
>>>>> # corosync-cmapctl
>>>>> Failed to initialize the cmap API. Error CS_ERR_TRY_AGAIN
>>>>> # corosync-quorumtool
>>>>> Cannot initialize CMAP service
>>>>>
>>>>> If I wait long enough (about 9 minutes or 530 seconds), it does end
>>>>> up starting, and the tools work, but corosync-quorumtool shows the
>>>>> only member is itself.
>>>>>
>>>>> However if I start corosync with `strace -f corosync -f` the tools
>>>>> work fine immediately upon start (though it still doesn't show the
>>>>> other nodes). Smells like race condition, but dunno where to begin.
>>>>>
>>>>>
>>>>
>>>> My guess is something is wrong with your network relating to
>>>> multicast.  Try using udpu mode - it is very stable now and removes
>>>> multicast from the list of things that can go wrong.
>>>>
>>>
>>> I am using udpu, see the config :-)
>>>
>>>
>> I assume you have the same config on all nodes?  If so, try using ip
>> addresses for the ring id.  possibly a DNS resolution problem?
>>
>> Other then that, I'm stumped
> 
> Yes, exact same config on all nodes. All hosts are present in
> /etc/hosts. Also when I do a tcpdump on the other nodes, I see traffic
> on port 5405 coming from the node in question.
> 
>>
>> Regards
>> -steve
>>
>>>> Regards
>>>> -steve
>>>>
>>>>>
>>>>> This is the output from `corosync -f` (this node is 10.20.0.212):
>>>>> notice  [TOTEM ] Initializing transport (UDP/IP Unicast).
>>>>> notice  [TOTEM ] Initializing transmit/receive security (NSS)
>>>>> crypto: none hash: none
>>>>> notice  [TOTEM ] The network interface [10.20.0.212] is now up.
>>>>> notice  [TOTEM ] adding new UDPU member {10.20.0.127}
>>>>> notice  [TOTEM ] adding new UDPU member {10.20.0.212}
>>>>> notice  [TOTEM ] adding new UDPU member {10.20.2.124}
>>>>> notice  [TOTEM ] A new membership (10.20.0.212:1122820) was formed.
>>>>> Members joined: 2
>>>>> notice  [TOTEM ] A new membership (10.20.0.127:1122824) was formed.
>>>>> Members joined: 1 3
>>>>> ### here is where it pauses for almost 9 minutes ###
>>>>> error   [TOTEM ] FAILED TO RECEIVE
>>>>> notice  [TOTEM ] A new membership (10.20.0.212:1122876) was formed.
>>>>> Members left: 1 3
>>>>> notice  [TOTEM ] A new membership (10.20.0.212:1122936) was formed.
>>>>> Members
>>>>> notice  [TOTEM ] A new membership (10.20.0.212:1123008) was formed.
>>>>> Members
>>>>> notice  [TOTEM ] A new membership (10.20.0.212:1123064) was formed.
>>>>> Members
>>>>> notice  [TOTEM ] A new membership (10.20.0.212:1123124) was formed.
>>>>> Members
>>>>> notice  [TOTEM ] A new membership (10.20.0.212:1123180) was formed.
>>>>> Members
>>>>> notice  [TOTEM ] A new membership (10.20.0.212:1123248) was formed.
>>>>> Members
>>>>> notice  [TOTEM ] A new membership (10.20.0.127:1123256) was formed.
>>>>> Members joined: 1 3
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> This is the config (created by `pcs` utility), it's exactly the
>>>>> same on all 3 nodes, and the other 2 nodes work fine:
>>>>> ----
>>>>> totem {
>>>>> version: 2
>>>>> secauth: off
>>>>> cluster_name: hapi-server
>>>>> transport: udpu
>>>>> }
>>>>>
>>>>> nodelist {
>>>>>   node {
>>>>>         ring0_addr: i-74eb9c2f
>>>>>         nodeid: 1
>>>>>        }
>>>>>   node {
>>>>>         ring0_addr: i-a3bf0df9
>>>>>         nodeid: 2
>>>>>        }
>>>>>   node {
>>>>>         ring0_addr: i-ebcfcbb0
>>>>>         nodeid: 3
>>>>>        }
>>>>> }
>>>>>
>>>>> quorum {
>>>>> provider: corosync_votequorum
>>>>> }
>>>>>
>>>>> logging {
>>>>> to_syslog: yes
>>>>> }
>>>>> ----
>>>>>
>>>>>
>>>>>
>>>>> -Patrick
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list
>>>>> discuss@xxxxxxxxxxxx
>>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>>
>>>
>>
> 
> 
> 
> Here's some additional info from the command line utils after waiting 9
> minutes for it to come up:
> 
> # corosync-quorumtool
> Quorum information
> ------------------
> Date:             Mon Sep 30 22:16:24 2013
> Quorum provider:  corosync_votequorum
> Nodes:            1
> Node ID:          2
> Ring ID:          1124320
> Quorate:          No
> 
> Votequorum information
> ----------------------
> Expected votes:   3
> Highest expected: 3
> Total votes:      1
> Quorum:           2 Activity blocked
> Flags:           
> 
> Membership information
> ----------------------
>     Nodeid      Votes Name
>          2          1 i-a3bf0df9 (local)
> 
> 
> # corosync-cmapctl |grep member
> runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.20.0.127)
> runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 15
> runtime.totem.pg.mrp.srp.members.1.status (str) = joined
> runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.20.0.212)
> runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.2.status (str) = joined
> runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(10.20.2.124)
> runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 15
> runtime.totem.pg.mrp.srp.members.3.status (str) = joined
> 
> 
> 
> -Patrick
> 
> 
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss




[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux