Unicast not working (config problem?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi List!
I am trying to utilize corosync+pacemaker under Debian Squeeze, and have run into a problem I can't solve.

I need to use unicast (because I want to add a standby quorum member in another subnet to have an uneven number of nodes), and have thus installed the package versions from squeeze-backports (apt-get install -t squeeze-backports corosync pacemaker), since the regular squeeze version of corossync (1.2.1) is to old to support unicast. The version from squeeze-backports I am using is 1.4.2-1~bpo60+1. I can get the cluster up and running using multicast. However, when I modify /etc/corosync/corosync.conf of the nodes for unicast, like so:
totem {
    ...
    interface {
        member {
<IP>
        }
<more members>
        ringnumber: 0
        bindnetaddr: 192.168.0.0
        mcastport: 4960
    }
    transport: udpu
}

(->I added in the member stancas and "transport: udpu" for unicast, and removed mcastaddr. Otherwise the configuration stays the same) When I then start the node, corosync starts up, and listens on the external interface on port 4960 (as specified under mcastport).
However, in the log I get the following message every few seconds:
Mar 06 16:21:05 corosync [TOTEM ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.

tcpdump tells me that no packets are being send out by corosync, of any kind.
iptables has no rules to drop anything:
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source

no other firewalling or other capability restriction is installed.
corosync runs as root.

Googling for the error from the log shows that I am apparantly the only one with this problem :-(
So basically I am out of ideas, and thus asking for help here.
I am attaching the startup log from corosync.

Thanks in advance for any helpfull hints or information!

Output from the log:
Mar 06 16:19:45 corosync [MAIN ] Corosync Cluster Engine ('1.4.2'): started and ready to provide service.
Mar 06 16:19:45 corosync [MAIN  ] Corosync built-in features: nss
Mar 06 16:19:45 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Mar 06 16:19:45 corosync [TOTEM ] Initializing transport (UDP/IP Unicast).
Mar 06 16:19:45 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Set r/w permissions for uid=0, gid=0 on /var/log/corosync/corosync.log
Mar 06 16:19:45 corosync [TOTEM ] The network interface [192.168.0.172] is now up.
Mar 06 16:19:45 corosync [pcmk  ] info: process_ais_conf: Reading configure
Mar 06 16:19:45 corosync [pcmk ] info: config_find_init: Local handle: 5650605097994944515 for logging Mar 06 16:19:45 corosync [pcmk ] info: config_find_next: Processing additional logging options... Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Found 'off' for option: debug Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Found 'yes' for option: to_logfile Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Found '/var/log/corosync/corosync.log' for option: logfile Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Found 'no' for option: to_syslog Mar 06 16:19:45 corosync [pcmk ] info: process_ais_conf: User configured file based logging and explicitly disabled syslog. Mar 06 16:19:45 corosync [pcmk ] info: config_find_init: Local handle: 2730409743423111172 for quorum Mar 06 16:19:45 corosync [pcmk ] info: config_find_next: No additional configuration supplied for: quorum Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: No default for option: provider Mar 06 16:19:45 corosync [pcmk ] info: config_find_init: Local handle: 5880381755227111429 for service Mar 06 16:19:45 corosync [pcmk ] info: config_find_next: Processing additional service options... Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Found '0' for option: ver Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Defaulting to 'pcmk' for option: clustername Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd Mar 06 16:19:45 corosync [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_mgmtd
Mar 06 16:19:45 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
Mar 06 16:19:45 corosync [pcmk  ] Logging: Initialized pcmk_startup
Mar 06 16:19:45 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
Mar 06 16:19:45 corosync [pcmk  ] info: pcmk_startup: Service: 9
Mar 06 16:19:45 corosync [pcmk ] info: pcmk_startup: Local hostname: haTest-1 Mar 06 16:19:45 corosync [pcmk ] info: pcmk_update_nodeid: Local node id: 738240704 Mar 06 16:19:45 corosync [pcmk ] info: update_member: Creating entry for node 738240704 born on 0 Mar 06 16:19:45 corosync [pcmk ] info: update_member: 0xbb0830 Node 738240704 now known as haTest-1 (was: (null)) Mar 06 16:19:45 corosync [pcmk ] info: update_member: Node haTest-1 now has 1 quorum votes (was 0) Mar 06 16:19:45 corosync [pcmk ] info: update_member: Node 738240704/haTest-1 is now: member Mar 06 16:19:45 corosync [pcmk ] info: spawn_child: Forked child 15078 for process stonith-ng Mar 06 16:19:45 corosync [pcmk ] info: spawn_child: Forked child 15079 for process cib Mar 06 16:19:45 corosync [pcmk ] info: spawn_child: Forked child 15080 for process lrmd Mar 06 16:19:45 corosync [pcmk ] info: spawn_child: Forked child 15081 for process attrd Mar 06 16:19:45 corosync [pcmk ] info: spawn_child: Forked child 15082 for process pengine Mar 06 16:19:45 corosync [pcmk ] info: spawn_child: Forked child 15083 for process crmd Mar 06 16:19:45 corosync [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.1.6 Mar 06 16:19:45 corosync [SERV ] Service engine loaded: corosync extended virtual synchrony service Mar 06 16:19:45 corosync [SERV ] Service engine loaded: corosync configuration service Mar 06 16:19:45 corosync [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 Mar 06 16:19:45 corosync [SERV ] Service engine loaded: corosync cluster config database access v1.01 Mar 06 16:19:45 corosync [SERV ] Service engine loaded: corosync profile loading service Mar 06 16:19:45 corosync [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Mar 06 16:19:45 corosync [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: Invoked: /usr/lib/heartbeat/stonithd Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: get_cluster_type: Cluster type is: 'openais' Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: init_ais_connection_classic: Creating connection to our Corosync plugin Mar 06 16:19:45 haTest-1 cib: [15079]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/hacluster Mar 06 16:19:45 haTest-1 cib: [15079]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig) Mar 06 16:19:45 haTest-1 cib: [15079]: info: validate_with_relaxng: Creating RNG parser context Mar 06 16:19:45 haTest-1 cib: [15079]: info: startCib: CIB Initialization completed successfully Mar 06 16:19:45 haTest-1 cib: [15079]: info: get_cluster_type: Cluster type is: 'openais' Mar 06 16:19:45 haTest-1 cib: [15079]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Mar 06 16:19:45 haTest-1 cib: [15079]: info: init_ais_connection_classic: Creating connection to our Corosync plugin
Mar 06 16:19:45 haTest-1 lrmd: [15080]: info: enabling coredumps
Mar 06 16:19:45 haTest-1 lrmd: [15080]: WARN: Core dumps could be lost if multiple dumps occur. Mar 06 16:19:45 haTest-1 lrmd: [15080]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability Mar 06 16:19:45 haTest-1 lrmd: [15080]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
Mar 06 16:19:45 haTest-1 lrmd: [15080]: info: Started.
Mar 06 16:19:45 haTest-1 attrd: [15081]: info: Invoked: /usr/lib/heartbeat/attrd Mar 06 16:19:45 haTest-1 attrd: [15081]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Mar 06 16:19:45 haTest-1 pengine: [15082]: info: Invoked: /usr/lib/heartbeat/pengine Mar 06 16:19:45 haTest-1 crmd: [15083]: info: Invoked: /usr/lib/heartbeat/crmd Mar 06 16:19:45 haTest-1 crmd: [15083]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/hacluster Mar 06 16:19:45 haTest-1 crmd: [15083]: info: main: CRM Hg Version: 9971ebba4494012a93c03b40a2c58ec0eb60f50c
Mar 06 16:19:45 haTest-1 crmd: [15083]: info: crmd_init: Starting crmd
Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: init_ais_connection_classic: AIS connection established Mar 06 16:19:45 haTest-1 cib: [15079]: info: init_ais_connection_classic: AIS connection established Mar 06 16:19:45 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0xbb9b30 for stonith-ng/15078 Mar 06 16:19:45 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0xbbde90 for cib/15079 Mar 06 16:19:45 corosync [pcmk ] info: update_member: Node haTest-1 now has process list: 00000000000000000000000000111312 (1118994) Mar 06 16:19:45 corosync [pcmk ] info: pcmk_ipc: Sending membership update 0 to cib Mar 06 16:19:45 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0xbc21f0 for attrd/15081 Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: get_ais_nodeid: Server details: id=738240704 uname=haTest-1 cname=pcmk Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: init_ais_connection_once: Connection to 'classic openais (with plugin)': established Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: crm_new_peer: Node haTest-1 now has id: 738240704 Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: crm_new_peer: Node 738240704 is now known as haTest-1 Mar 06 16:19:45 haTest-1 stonith-ng: [15078]: info: main: Starting stonith-ng mainloop Mar 06 16:19:45 haTest-1 cib: [15079]: info: get_ais_nodeid: Server details: id=738240704 uname=haTest-1 cname=pcmk Mar 06 16:19:45 haTest-1 cib: [15079]: info: init_ais_connection_once: Connection to 'classic openais (with plugin)': established Mar 06 16:19:45 haTest-1 cib: [15079]: info: crm_new_peer: Node haTest-1 now has id: 738240704 Mar 06 16:19:45 haTest-1 cib: [15079]: info: crm_new_peer: Node 738240704 is now known as haTest-1
Mar 06 16:19:45 haTest-1 cib: [15079]: info: cib_init: Starting cib mainloop
Mar 06 16:19:45 haTest-1 cib: [15079]: info: ais_dispatch_message: Membership 0: quorum still lost Mar 06 16:19:45 haTest-1 cib: [15079]: info: crm_update_peer: Node haTest-1: id=738240704 state=member (new) addr=(null) votes=1 (new) born=0 seen=0 proc=00000000000000000000000000111312 (new)
Mar 06 16:19:45 haTest-1 attrd: [15081]: notice: main: Starting mainloop...
Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_cib_control: CIB connection established Mar 06 16:19:46 haTest-1 crmd: [15083]: info: get_cluster_type: Cluster type is: 'openais' Mar 06 16:19:46 haTest-1 crmd: [15083]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Mar 06 16:19:46 haTest-1 crmd: [15083]: info: init_ais_connection_classic: Creating connection to our Corosync plugin Mar 06 16:19:46 haTest-1 crmd: [15083]: info: init_ais_connection_classic: AIS connection established Mar 06 16:19:46 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0xbc7c70 for crmd/15083 Mar 06 16:19:46 corosync [pcmk ] info: pcmk_ipc: Sending membership update 0 to crmd Mar 06 16:19:46 haTest-1 crmd: [15083]: info: get_ais_nodeid: Server details: id=738240704 uname=haTest-1 cname=pcmk Mar 06 16:19:46 haTest-1 crmd: [15083]: info: init_ais_connection_once: Connection to 'classic openais (with plugin)': established Mar 06 16:19:46 haTest-1 crmd: [15083]: info: crm_new_peer: Node haTest-1 now has id: 738240704 Mar 06 16:19:46 haTest-1 crmd: [15083]: info: crm_new_peer: Node 738240704 is now known as haTest-1 Mar 06 16:19:46 haTest-1 crmd: [15083]: info: ais_status_callback: status: haTest-1 is now unknown Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_ha_control: Connected to the cluster Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_started: Delaying start, no membership data (0000000000100000) Mar 06 16:19:46 haTest-1 crmd: [15083]: info: crmd_init: Starting crmd's mainloop Mar 06 16:19:46 haTest-1 crmd: [15083]: info: ais_dispatch_message: Membership 0: quorum still lost Mar 06 16:19:46 haTest-1 crmd: [15083]: notice: crmd_peer_update: Status update: Client haTest-1/crmd now has status [online] (DC=<null>) Mar 06 16:19:46 haTest-1 crmd: [15083]: info: ais_status_callback: status: haTest-1 is now member (was unknown) Mar 06 16:19:46 haTest-1 crmd: [15083]: info: crm_update_peer: Node haTest-1: id=738240704 state=member (new) addr=(null) votes=1 (new) born=0 seen=0 proc=00000000000000000000000000111312 (new) Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_started: Delaying start, Config not read (0000000000000040) Mar 06 16:19:46 haTest-1 crmd: [15083]: info: config_query_callback: Shutdown escalation occurs after: 1200000ms Mar 06 16:19:46 haTest-1 crmd: [15083]: info: config_query_callback: Checking for expired actions every 900000ms Mar 06 16:19:46 haTest-1 crmd: [15083]: info: config_query_callback: Sending expected-votes=2 to corosync Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_started: The local CRM is operational Mar 06 16:19:46 haTest-1 crmd: [15083]: info: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] Mar 06 16:19:47 haTest-1 crmd: [15083]: info: ais_dispatch_message: Membership 0: quorum still lost Mar 06 16:19:47 haTest-1 crmd: [15083]: info: te_connect_stonith: Attempting connection to fencing daemon...
Mar 06 16:19:48 haTest-1 crmd: [15083]: info: te_connect_stonith: Connected
Mar 06 16:20:07 haTest-1 crmd: [15083]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped (20000ms) Mar 06 16:20:07 haTest-1 crmd: [15083]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING Mar 06 16:20:07 haTest-1 crmd: [15083]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ] Mar 06 16:20:12 corosync [TOTEM ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly. Mar 06 16:20:18 corosync [TOTEM ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly. Mar 06 16:20:25 corosync [TOTEM ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
...

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss


[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux