Unicast not working (config problem?)

Martin Gerdes <martin.gerdes@xxxxxxx> · Wed, 07 Mar 2012 10:13:14 +0100

Hi List!
I am trying to utilize corosync+pacemaker under Debian Squeeze, and have 
run into a problem I can't solve.

I need to use unicast (because I want to add a standby quorum member in 
another subnet to have an uneven number of nodes), and have thus 
installed the package versions from squeeze-backports (apt-get install 
-t squeeze-backports corosync pacemaker), since the regular squeeze 
version of corossync (1.2.1) is to old to support unicast. The version 
from squeeze-backports I am using is 1.4.2-1~bpo60+1.
I can get the cluster up and running using multicast. However, when I 
modify /etc/corosync/corosync.conf of the nodes for unicast, like so:
totem {
    ...
    interface {
        member {
<IP>
        }
<more members>
        ringnumber: 0
        bindnetaddr: 192.168.0.0
        mcastport: 4960
    }
    transport: udpu
}

(->I added in the member stancas and "transport: udpu" for unicast, and 
removed mcastaddr. Otherwise the configuration stays the same)
When I then start the node, corosync starts up, and listens on the 
external interface on port 4960 (as specified under mcastport).
However, in the log I get the following message every few seconds:
Mar 06 16:21:05 corosync [TOTEM ] Totem is unable to form a cluster 
because of an operating system or network fault. The most common cause 
of this message is that the local firewall is configured improperly.

tcpdump tells me that no packets are being send out by corosync, of any 
kind.
iptables has no rules to drop anything:
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source

no other firewalling or other capability restriction is installed.
corosync runs as root.

Googling for the error from the log shows that I am apparantly the only 
one with this problem :-(
So basically I am out of ideas, and thus asking for help here.
I am attaching the startup log from corosync.

Thanks in advance for any helpfull hints or information!

Output from the log:
Mar 06 16:19:45 corosync [MAIN  ] Corosync Cluster Engine ('1.4.2'): 
started and ready to provide service.
Mar 06 16:19:45 corosync [MAIN  ] Corosync built-in features: nss
Mar 06 16:19:45 corosync [MAIN  ] Successfully read main configuration 
file '/etc/corosync/corosync.conf'.
Mar 06 16:19:45 corosync [TOTEM ] Initializing transport (UDP/IP Unicast).
Mar 06 16:19:45 corosync [TOTEM ] Initializing transmit/receive 
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Set r/w permissions for uid=0, gid=0 on /var/log/corosync/corosync.log
Mar 06 16:19:45 corosync [TOTEM ] The network interface [192.168.0.172] 
is now up.
Mar 06 16:19:45 corosync [pcmk  ] info: process_ais_conf: Reading configure
Mar 06 16:19:45 corosync [pcmk  ] info: config_find_init: Local handle: 
5650605097994944515 for logging
Mar 06 16:19:45 corosync [pcmk  ] info: config_find_next: Processing 
additional logging options...
Mar 06 16:19:45 corosync [pcmk  ] info: get_config_opt: Found 'off' for 
option: debug
Mar 06 16:19:45 corosync [pcmk  ] info: get_config_opt: Found 'yes' for 
option: to_logfile
Mar 06 16:19:45 corosync [pcmk  ] info: get_config_opt: Found 
'/var/log/corosync/corosync.log' for option: logfile
Mar 06 16:19:45 corosync [pcmk  ] info: get_config_opt: Found 'no' for 
option: to_syslog
Mar 06 16:19:45 corosync [pcmk  ] info: process_ais_conf: User 
configured file based logging and explicitly disabled syslog.
Mar 06 16:19:45 corosync [pcmk  ] info: config_find_init: Local handle: 
2730409743423111172 for quorum
Mar 06 16:19:45 corosync [pcmk  ] info: config_find_next: No additional 
configuration supplied for: quorum
Mar 06 16:19:45 corosync [pcmk  ] info: get_config_opt: No default for 
option: provider
Mar 06 16:19:45 corosync [pcmk  ] info: config_find_init: Local handle: 
5880381755227111429 for service
Mar 06 16:19:45 corosync [pcmk  ] info: config_find_next: Processing 
additional service options...
Mar 06 16:19:45 corosync [pcmk  ] info: get_config_opt: Found '0' for 
option: ver
Mar 06 16:19:45 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'pcmk' for option: clustername
Mar 06 16:19:45 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'no' for option: use_logd
Mar 06 16:19:45 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'no' for option: use_mgmtd
Mar 06 16:19:45 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
Mar 06 16:19:45 corosync [pcmk  ] Logging: Initialized pcmk_startup
Mar 06 16:19:45 corosync [pcmk  ] info: pcmk_startup: Maximum core file 
size is: 18446744073709551615
Mar 06 16:19:45 corosync [pcmk  ] info: pcmk_startup: Service: 9
Mar 06 16:19:45 corosync [pcmk  ] info: pcmk_startup: Local hostname: 
haTest-1
Mar 06 16:19:45 corosync [pcmk  ] info: pcmk_update_nodeid: Local node 
id: 738240704
Mar 06 16:19:45 corosync [pcmk  ] info: update_member: Creating entry 
for node 738240704 born on 0
Mar 06 16:19:45 corosync [pcmk  ] info: update_member: 0xbb0830 Node 
738240704 now known as haTest-1 (was: (null))
Mar 06 16:19:45 corosync [pcmk  ] info: update_member: Node haTest-1 now 
has 1 quorum votes (was 0)
Mar 06 16:19:45 corosync [pcmk  ] info: update_member: Node 
738240704/haTest-1 is now: member
Mar 06 16:19:45 corosync [pcmk  ] info: spawn_child: Forked child 15078 
for process stonith-ng
Mar 06 16:19:45 corosync [pcmk  ] info: spawn_child: Forked child 15079 
for process cib
Mar 06 16:19:45 corosync [pcmk  ] info: spawn_child: Forked child 15080 
for process lrmd
Mar 06 16:19:45 corosync [pcmk  ] info: spawn_child: Forked child 15081 
for process attrd
Mar 06 16:19:45 corosync [pcmk  ] info: spawn_child: Forked child 15082 
for process pengine
Mar 06 16:19:45 corosync [pcmk  ] info: spawn_child: Forked child 15083 
for process crmd
Mar 06 16:19:45 corosync [SERV  ] Service engine loaded: Pacemaker 
Cluster Manager 1.1.6
Mar 06 16:19:45 corosync [SERV  ] Service engine loaded: corosync 
extended virtual synchrony service
Mar 06 16:19:45 corosync [SERV  ] Service engine loaded: corosync 
configuration service
Mar 06 16:19:45 corosync [SERV  ] Service engine loaded: corosync 
cluster closed process group service v1.01
Mar 06 16:19:45 corosync [SERV  ] Service engine loaded: corosync 
cluster config database access v1.01
Mar 06 16:19:45 corosync [SERV  ] Service engine loaded: corosync 
profile loading service
Mar 06 16:19:45 corosync [SERV  ] Service engine loaded: corosync 
cluster quorum service v0.1
Mar 06 16:19:45 corosync [MAIN  ] Compatibility mode set to whitetank.  
Using V1 and V2 of the synchronization engine.
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: Invoked: 
/usr/lib/heartbeat/stonithd
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: crm_log_init_worker: 
Changed active directory to /var/lib/heartbeat/cores/root
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: get_cluster_type: 
Cluster type is: 'openais'
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: notice: 
crm_cluster_connect: Connecting to cluster infrastructure: classic 
openais (with plugin)
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: 
init_ais_connection_classic: Creating connection to our Corosync plugin
Mar 06 16:19:45 haTest-1 cib: [15079]: info: crm_log_init_worker: 
Changed active directory to /var/lib/heartbeat/cores/hacluster
Mar 06 16:19:45 haTest-1 cib: [15079]: info: retrieveCib: Reading 
cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
/var/lib/heartbeat/crm/cib.xml.sig)
Mar 06 16:19:45 haTest-1 cib: [15079]: info: validate_with_relaxng: 
Creating RNG parser context
Mar 06 16:19:45 haTest-1 cib: [15079]: info: startCib: CIB 
Initialization completed successfully
Mar 06 16:19:45 haTest-1 cib: [15079]: info: get_cluster_type: Cluster 
type is: 'openais'
Mar 06 16:19:45 haTest-1 cib: [15079]: notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Mar 06 16:19:45 haTest-1 cib: [15079]: info: 
init_ais_connection_classic: Creating connection to our Corosync plugin
Mar 06 16:19:45 haTest-1 lrmd: [15080]: info: enabling coredumps
Mar 06 16:19:45 haTest-1 lrmd: [15080]: WARN: Core dumps could be lost 
if multiple dumps occur.
Mar 06 16:19:45 haTest-1 lrmd: [15080]: WARN: Consider setting 
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for 
maximum supportability
Mar 06 16:19:45 haTest-1 lrmd: [15080]: WARN: Consider setting 
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum 
supportability
Mar 06 16:19:45 haTest-1 lrmd: [15080]: info: Started.
Mar 06 16:19:45 haTest-1 attrd: [15081]: info: Invoked: 
/usr/lib/heartbeat/attrd
Mar 06 16:19:45 haTest-1 attrd: [15081]: notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Mar 06 16:19:45 haTest-1 pengine: [15082]: info: Invoked: 
/usr/lib/heartbeat/pengine
Mar 06 16:19:45 haTest-1 crmd: [15083]: info: Invoked: 
/usr/lib/heartbeat/crmd
Mar 06 16:19:45 haTest-1 crmd: [15083]: info: crm_log_init_worker: 
Changed active directory to /var/lib/heartbeat/cores/hacluster
Mar 06 16:19:45 haTest-1 crmd: [15083]: info: main: CRM Hg Version: 
9971ebba4494012a93c03b40a2c58ec0eb60f50c
Mar 06 16:19:45 haTest-1 crmd: [15083]: info: crmd_init: Starting crmd
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: 
init_ais_connection_classic: AIS connection established
Mar 06 16:19:45 haTest-1 cib: [15079]: info: 
init_ais_connection_classic: AIS connection established
Mar 06 16:19:45 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 
0xbb9b30 for stonith-ng/15078
Mar 06 16:19:45 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 
0xbbde90 for cib/15079
Mar 06 16:19:45 corosync [pcmk  ] info: update_member: Node haTest-1 now 
has process list: 00000000000000000000000000111312 (1118994)
Mar 06 16:19:45 corosync [pcmk  ] info: pcmk_ipc: Sending membership 
update 0 to cib
Mar 06 16:19:45 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 
0xbc21f0 for attrd/15081
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: get_ais_nodeid: 
Server details: id=738240704 uname=haTest-1 cname=pcmk
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: 
init_ais_connection_once: Connection to 'classic openais (with plugin)': 
established
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: crm_new_peer: Node 
haTest-1 now has id: 738240704
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: crm_new_peer: Node 
738240704 is now known as haTest-1
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: main: Starting 
stonith-ng mainloop
Mar 06 16:19:45 haTest-1 cib: [15079]: info: get_ais_nodeid: Server 
details: id=738240704 uname=haTest-1 cname=pcmk
Mar 06 16:19:45 haTest-1 cib: [15079]: info: init_ais_connection_once: 
Connection to 'classic openais (with plugin)': established
Mar 06 16:19:45 haTest-1 cib: [15079]: info: crm_new_peer: Node haTest-1 
now has id: 738240704
Mar 06 16:19:45 haTest-1 cib: [15079]: info: crm_new_peer: Node 
738240704 is now known as haTest-1
Mar 06 16:19:45 haTest-1 cib: [15079]: info: cib_init: Starting cib mainloop
Mar 06 16:19:45 haTest-1 cib: [15079]: info: ais_dispatch_message: 
Membership 0: quorum still lost
Mar 06 16:19:45 haTest-1 cib: [15079]: info: crm_update_peer: Node 
haTest-1: id=738240704 state=member (new) addr=(null) votes=1 (new) 
born=0 seen=0 proc=00000000000000000000000000111312 (new)
Mar 06 16:19:45 haTest-1 attrd: [15081]: notice: main: Starting mainloop...
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_cib_control: CIB 
connection established
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: get_cluster_type: Cluster 
type is: 'openais'
Mar 06 16:19:46 haTest-1 crmd: [15083]: notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: 
init_ais_connection_classic: Creating connection to our Corosync plugin
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: 
init_ais_connection_classic: AIS connection established
Mar 06 16:19:46 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 
0xbc7c70 for crmd/15083
Mar 06 16:19:46 corosync [pcmk  ] info: pcmk_ipc: Sending membership 
update 0 to crmd
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: get_ais_nodeid: Server 
details: id=738240704 uname=haTest-1 cname=pcmk
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: init_ais_connection_once: 
Connection to 'classic openais (with plugin)': established
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: crm_new_peer: Node 
haTest-1 now has id: 738240704
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: crm_new_peer: Node 
738240704 is now known as haTest-1
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: ais_status_callback: 
status: haTest-1 is now unknown
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_ha_control: Connected 
to the cluster
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_started: Delaying 
start, no membership data (0000000000100000)
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: crmd_init: Starting crmd's 
mainloop
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: ais_dispatch_message: 
Membership 0: quorum still lost
Mar 06 16:19:46 haTest-1 crmd: [15083]: notice: crmd_peer_update: Status 
update: Client haTest-1/crmd now has status [online] (DC=<null>)
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: ais_status_callback: 
status: haTest-1 is now member (was unknown)
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: crm_update_peer: Node 
haTest-1: id=738240704 state=member (new) addr=(null) votes=1 (new) 
born=0 seen=0 proc=00000000000000000000000000111312 (new)
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_started: Delaying 
start, Config not read (0000000000000040)
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: config_query_callback: 
Shutdown escalation occurs after: 1200000ms
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: config_query_callback: 
Checking for expired actions every 900000ms
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: config_query_callback: 
Sending expected-votes=2 to corosync
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_started: The local CRM 
is operational
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_state_transition: State 
transition S_STARTING -> S_PENDING [ input=I_PENDING 
cause=C_FSA_INTERNAL origin=do_started ]
Mar 06 16:19:47 haTest-1 crmd: [15083]: info: ais_dispatch_message: 
Membership 0: quorum still lost
Mar 06 16:19:47 haTest-1 crmd: [15083]: info: te_connect_stonith: 
Attempting connection to fencing daemon...
Mar 06 16:19:48 haTest-1 crmd: [15083]: info: te_connect_stonith: Connected
Mar 06 16:20:07 haTest-1 crmd: [15083]: info: crm_timer_popped: Election 
Trigger (I_DC_TIMEOUT) just popped (20000ms)
Mar 06 16:20:07 haTest-1 crmd: [15083]: WARN: do_log: FSA: Input 
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Mar 06 16:20:07 haTest-1 crmd: [15083]: info: do_state_transition: State 
transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT 
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Mar 06 16:20:12 corosync [TOTEM ] Totem is unable to form a cluster 
because of an operating system or network fault. The most common cause 
of this message is that the local firewall is configured improperly.
Mar 06 16:20:18 corosync [TOTEM ] Totem is unable to form a cluster 
because of an operating system or network fault. The most common cause 
of this message is that the local firewall is configured improperly.
Mar 06 16:20:25 corosync [TOTEM ] Totem is unable to form a cluster 
because of an operating system or network fault. The most common cause 
of this message is that the local firewall is configured improperly.
...

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss