Hi List!
I am trying to utilize corosync+pacemaker under Debian Squeeze, and have
run into a problem I can't solve.
I need to use unicast (because I want to add a standby quorum member in
another subnet to have an uneven number of nodes), and have thus
installed the package versions from squeeze-backports (apt-get install
-t squeeze-backports corosync pacemaker), since the regular squeeze
version of corossync (1.2.1) is to old to support unicast. The version
from squeeze-backports I am using is 1.4.2-1~bpo60+1.
I can get the cluster up and running using multicast. However, when I
modify /etc/corosync/corosync.conf of the nodes for unicast, like so:
totem {
...
interface {
member {
<IP>
}
<more members>
ringnumber: 0
bindnetaddr: 192.168.0.0
mcastport: 4960
}
transport: udpu
}
(->I added in the member stancas and "transport: udpu" for unicast, and
removed mcastaddr. Otherwise the configuration stays the same)
When I then start the node, corosync starts up, and listens on the
external interface on port 4960 (as specified under mcastport).
However, in the log I get the following message every few seconds:
Mar 06 16:21:05 corosync [TOTEM ] Totem is unable to form a cluster
because of an operating system or network fault. The most common cause
of this message is that the local firewall is configured improperly.
tcpdump tells me that no packets are being send out by corosync, of any
kind.
iptables has no rules to drop anything:
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source
no other firewalling or other capability restriction is installed.
corosync runs as root.
Googling for the error from the log shows that I am apparantly the only
one with this problem :-(
So basically I am out of ideas, and thus asking for help here.
I am attaching the startup log from corosync.
Thanks in advance for any helpfull hints or information!
Output from the log:
Mar 06 16:19:45 corosync [MAIN ] Corosync Cluster Engine ('1.4.2'):
started and ready to provide service.
Mar 06 16:19:45 corosync [MAIN ] Corosync built-in features: nss
Mar 06 16:19:45 corosync [MAIN ] Successfully read main configuration
file '/etc/corosync/corosync.conf'.
Mar 06 16:19:45 corosync [TOTEM ] Initializing transport (UDP/IP Unicast).
Mar 06 16:19:45 corosync [TOTEM ] Initializing transmit/receive
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Set r/w permissions for uid=0, gid=0 on /var/log/corosync/corosync.log
Mar 06 16:19:45 corosync [TOTEM ] The network interface [192.168.0.172]
is now up.
Mar 06 16:19:45 corosync [pcmk ] info: process_ais_conf: Reading configure
Mar 06 16:19:45 corosync [pcmk ] info: config_find_init: Local handle:
5650605097994944515 for logging
Mar 06 16:19:45 corosync [pcmk ] info: config_find_next: Processing
additional logging options...
Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Found 'off' for
option: debug
Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Found 'yes' for
option: to_logfile
Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Found
'/var/log/corosync/corosync.log' for option: logfile
Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Found 'no' for
option: to_syslog
Mar 06 16:19:45 corosync [pcmk ] info: process_ais_conf: User
configured file based logging and explicitly disabled syslog.
Mar 06 16:19:45 corosync [pcmk ] info: config_find_init: Local handle:
2730409743423111172 for quorum
Mar 06 16:19:45 corosync [pcmk ] info: config_find_next: No additional
configuration supplied for: quorum
Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: No default for
option: provider
Mar 06 16:19:45 corosync [pcmk ] info: config_find_init: Local handle:
5880381755227111429 for service
Mar 06 16:19:45 corosync [pcmk ] info: config_find_next: Processing
additional service options...
Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Found '0' for
option: ver
Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Defaulting to
'pcmk' for option: clustername
Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Defaulting to
'no' for option: use_logd
Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Defaulting to
'no' for option: use_mgmtd
Mar 06 16:19:45 corosync [pcmk ] info: pcmk_startup: CRM: Initialized
Mar 06 16:19:45 corosync [pcmk ] Logging: Initialized pcmk_startup
Mar 06 16:19:45 corosync [pcmk ] info: pcmk_startup: Maximum core file
size is: 18446744073709551615
Mar 06 16:19:45 corosync [pcmk ] info: pcmk_startup: Service: 9
Mar 06 16:19:45 corosync [pcmk ] info: pcmk_startup: Local hostname:
haTest-1
Mar 06 16:19:45 corosync [pcmk ] info: pcmk_update_nodeid: Local node
id: 738240704
Mar 06 16:19:45 corosync [pcmk ] info: update_member: Creating entry
for node 738240704 born on 0
Mar 06 16:19:45 corosync [pcmk ] info: update_member: 0xbb0830 Node
738240704 now known as haTest-1 (was: (null))
Mar 06 16:19:45 corosync [pcmk ] info: update_member: Node haTest-1 now
has 1 quorum votes (was 0)
Mar 06 16:19:45 corosync [pcmk ] info: update_member: Node
738240704/haTest-1 is now: member
Mar 06 16:19:45 corosync [pcmk ] info: spawn_child: Forked child 15078
for process stonith-ng
Mar 06 16:19:45 corosync [pcmk ] info: spawn_child: Forked child 15079
for process cib
Mar 06 16:19:45 corosync [pcmk ] info: spawn_child: Forked child 15080
for process lrmd
Mar 06 16:19:45 corosync [pcmk ] info: spawn_child: Forked child 15081
for process attrd
Mar 06 16:19:45 corosync [pcmk ] info: spawn_child: Forked child 15082
for process pengine
Mar 06 16:19:45 corosync [pcmk ] info: spawn_child: Forked child 15083
for process crmd
Mar 06 16:19:45 corosync [SERV ] Service engine loaded: Pacemaker
Cluster Manager 1.1.6
Mar 06 16:19:45 corosync [SERV ] Service engine loaded: corosync
extended virtual synchrony service
Mar 06 16:19:45 corosync [SERV ] Service engine loaded: corosync
configuration service
Mar 06 16:19:45 corosync [SERV ] Service engine loaded: corosync
cluster closed process group service v1.01
Mar 06 16:19:45 corosync [SERV ] Service engine loaded: corosync
cluster config database access v1.01
Mar 06 16:19:45 corosync [SERV ] Service engine loaded: corosync
profile loading service
Mar 06 16:19:45 corosync [SERV ] Service engine loaded: corosync
cluster quorum service v0.1
Mar 06 16:19:45 corosync [MAIN ] Compatibility mode set to whitetank.
Using V1 and V2 of the synchronization engine.
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: Invoked:
/usr/lib/heartbeat/stonithd
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: crm_log_init_worker:
Changed active directory to /var/lib/heartbeat/cores/root
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: get_cluster_type:
Cluster type is: 'openais'
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: notice:
crm_cluster_connect: Connecting to cluster infrastructure: classic
openais (with plugin)
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info:
init_ais_connection_classic: Creating connection to our Corosync plugin
Mar 06 16:19:45 haTest-1 cib: [15079]: info: crm_log_init_worker:
Changed active directory to /var/lib/heartbeat/cores/hacluster
Mar 06 16:19:45 haTest-1 cib: [15079]: info: retrieveCib: Reading
cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Mar 06 16:19:45 haTest-1 cib: [15079]: info: validate_with_relaxng:
Creating RNG parser context
Mar 06 16:19:45 haTest-1 cib: [15079]: info: startCib: CIB
Initialization completed successfully
Mar 06 16:19:45 haTest-1 cib: [15079]: info: get_cluster_type: Cluster
type is: 'openais'
Mar 06 16:19:45 haTest-1 cib: [15079]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Mar 06 16:19:45 haTest-1 cib: [15079]: info:
init_ais_connection_classic: Creating connection to our Corosync plugin
Mar 06 16:19:45 haTest-1 lrmd: [15080]: info: enabling coredumps
Mar 06 16:19:45 haTest-1 lrmd: [15080]: WARN: Core dumps could be lost
if multiple dumps occur.
Mar 06 16:19:45 haTest-1 lrmd: [15080]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Mar 06 16:19:45 haTest-1 lrmd: [15080]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Mar 06 16:19:45 haTest-1 lrmd: [15080]: info: Started.
Mar 06 16:19:45 haTest-1 attrd: [15081]: info: Invoked:
/usr/lib/heartbeat/attrd
Mar 06 16:19:45 haTest-1 attrd: [15081]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Mar 06 16:19:45 haTest-1 pengine: [15082]: info: Invoked:
/usr/lib/heartbeat/pengine
Mar 06 16:19:45 haTest-1 crmd: [15083]: info: Invoked:
/usr/lib/heartbeat/crmd
Mar 06 16:19:45 haTest-1 crmd: [15083]: info: crm_log_init_worker:
Changed active directory to /var/lib/heartbeat/cores/hacluster
Mar 06 16:19:45 haTest-1 crmd: [15083]: info: main: CRM Hg Version:
9971ebba4494012a93c03b40a2c58ec0eb60f50c
Mar 06 16:19:45 haTest-1 crmd: [15083]: info: crmd_init: Starting crmd
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info:
init_ais_connection_classic: AIS connection established
Mar 06 16:19:45 haTest-1 cib: [15079]: info:
init_ais_connection_classic: AIS connection established
Mar 06 16:19:45 corosync [pcmk ] info: pcmk_ipc: Recorded connection
0xbb9b30 for stonith-ng/15078
Mar 06 16:19:45 corosync [pcmk ] info: pcmk_ipc: Recorded connection
0xbbde90 for cib/15079
Mar 06 16:19:45 corosync [pcmk ] info: update_member: Node haTest-1 now
has process list: 00000000000000000000000000111312 (1118994)
Mar 06 16:19:45 corosync [pcmk ] info: pcmk_ipc: Sending membership
update 0 to cib
Mar 06 16:19:45 corosync [pcmk ] info: pcmk_ipc: Recorded connection
0xbc21f0 for attrd/15081
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: get_ais_nodeid:
Server details: id=738240704 uname=haTest-1 cname=pcmk
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info:
init_ais_connection_once: Connection to 'classic openais (with plugin)':
established
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: crm_new_peer: Node
haTest-1 now has id: 738240704
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: crm_new_peer: Node
738240704 is now known as haTest-1
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: main: Starting
stonith-ng mainloop
Mar 06 16:19:45 haTest-1 cib: [15079]: info: get_ais_nodeid: Server
details: id=738240704 uname=haTest-1 cname=pcmk
Mar 06 16:19:45 haTest-1 cib: [15079]: info: init_ais_connection_once:
Connection to 'classic openais (with plugin)': established
Mar 06 16:19:45 haTest-1 cib: [15079]: info: crm_new_peer: Node haTest-1
now has id: 738240704
Mar 06 16:19:45 haTest-1 cib: [15079]: info: crm_new_peer: Node
738240704 is now known as haTest-1
Mar 06 16:19:45 haTest-1 cib: [15079]: info: cib_init: Starting cib mainloop
Mar 06 16:19:45 haTest-1 cib: [15079]: info: ais_dispatch_message:
Membership 0: quorum still lost
Mar 06 16:19:45 haTest-1 cib: [15079]: info: crm_update_peer: Node
haTest-1: id=738240704 state=member (new) addr=(null) votes=1 (new)
born=0 seen=0 proc=00000000000000000000000000111312 (new)
Mar 06 16:19:45 haTest-1 attrd: [15081]: notice: main: Starting mainloop...
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_cib_control: CIB
connection established
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: get_cluster_type: Cluster
type is: 'openais'
Mar 06 16:19:46 haTest-1 crmd: [15083]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Mar 06 16:19:46 haTest-1 crmd: [15083]: info:
init_ais_connection_classic: Creating connection to our Corosync plugin
Mar 06 16:19:46 haTest-1 crmd: [15083]: info:
init_ais_connection_classic: AIS connection established
Mar 06 16:19:46 corosync [pcmk ] info: pcmk_ipc: Recorded connection
0xbc7c70 for crmd/15083
Mar 06 16:19:46 corosync [pcmk ] info: pcmk_ipc: Sending membership
update 0 to crmd
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: get_ais_nodeid: Server
details: id=738240704 uname=haTest-1 cname=pcmk
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: init_ais_connection_once:
Connection to 'classic openais (with plugin)': established
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: crm_new_peer: Node
haTest-1 now has id: 738240704
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: crm_new_peer: Node
738240704 is now known as haTest-1
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: ais_status_callback:
status: haTest-1 is now unknown
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_ha_control: Connected
to the cluster
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_started: Delaying
start, no membership data (0000000000100000)
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: crmd_init: Starting crmd's
mainloop
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: ais_dispatch_message:
Membership 0: quorum still lost
Mar 06 16:19:46 haTest-1 crmd: [15083]: notice: crmd_peer_update: Status
update: Client haTest-1/crmd now has status [online] (DC=<null>)
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: ais_status_callback:
status: haTest-1 is now member (was unknown)
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: crm_update_peer: Node
haTest-1: id=738240704 state=member (new) addr=(null) votes=1 (new)
born=0 seen=0 proc=00000000000000000000000000111312 (new)
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_started: Delaying
start, Config not read (0000000000000040)
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: config_query_callback:
Shutdown escalation occurs after: 1200000ms
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: config_query_callback:
Checking for expired actions every 900000ms
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: config_query_callback:
Sending expected-votes=2 to corosync
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_started: The local CRM
is operational
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_state_transition: State
transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Mar 06 16:19:47 haTest-1 crmd: [15083]: info: ais_dispatch_message:
Membership 0: quorum still lost
Mar 06 16:19:47 haTest-1 crmd: [15083]: info: te_connect_stonith:
Attempting connection to fencing daemon...
Mar 06 16:19:48 haTest-1 crmd: [15083]: info: te_connect_stonith: Connected
Mar 06 16:20:07 haTest-1 crmd: [15083]: info: crm_timer_popped: Election
Trigger (I_DC_TIMEOUT) just popped (20000ms)
Mar 06 16:20:07 haTest-1 crmd: [15083]: WARN: do_log: FSA: Input
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Mar 06 16:20:07 haTest-1 crmd: [15083]: info: do_state_transition: State
transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Mar 06 16:20:12 corosync [TOTEM ] Totem is unable to form a cluster
because of an operating system or network fault. The most common cause
of this message is that the local firewall is configured improperly.
Mar 06 16:20:18 corosync [TOTEM ] Totem is unable to form a cluster
because of an operating system or network fault. The most common cause
of this message is that the local firewall is configured improperly.
Mar 06 16:20:25 corosync [TOTEM ] Totem is unable to form a cluster
because of an operating system or network fault. The most common cause
of this message is that the local firewall is configured improperly.
...
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss