Short description --------------------- HA nodes doesn't seem to communicate to each other via corosync Final goal ------------ Being able to HA zimbra as an 2-node active/pasive cluster. Description of the system ----------------------------------- This is an Ubuntu 10.04 LTS because current stable Zimbra works in Ubuntu 10.04 and not yet in 12.04. I've dist-upgraded packages from: https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa as it was advised on some sites. My main configuration is based on this document: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf I've created some OCF resource agents (for zimbra and some network stuff) on my own and I've already tested them thanks to ocf-tester and ocf-tester-py (a hack of mine of ocf-tester that allows you to test python based ocf scripts). Finally some packages versions: libcrmcluster1 1.1.6-2ubuntu0~ppa2 libcrmcommon2 1.1.6-2ubuntu0~ppa2 corosync 1.4.2-1ubuntu0~ppa1 libcorosync4 1.4.2-1ubuntu0~ppa1 lvm2 2.02.54-1ubuntu4.1ppa5 pacemaker 1.1.6-2ubuntu0~ppa2 libglib2.0-0 2.24.1-0ubuntu1.1~ppa1 lvm2 2.02.54-1ubuntu4.1ppa5 cluster-glue 1.0.8-2ubuntu0~ppa4 libcluster-glue 1.0.8-2ubuntu0~ppa4 resource-agents 1:3.9.2-4ubuntu0~ppa2 Node 1 - corosync-objctl runtime.totem.pg.mrp.srp.members ------------------------------------------------------------------------------- runtime.totem.pg.mrp.srp.171616448.ip=r(0) ip(192.168.58.10) runtime.totem.pg.mrp.srp.171616448.join_count=1 runtime.totem.pg.mrp.srp.171616448.status=joined Node 2 - corosync-objctl runtime.totem.pg.mrp.srp.members ------------------------------------------------------------------------------- runtime.totem.pg.mrp.srp.171616448.ip=r(0) ip(192.168.58.10) runtime.totem.pg.mrp.srp.171616448.join_count=1 runtime.totem.pg.mrp.srp.171616448.status=joined Node 1 - tcpdump -envv "port 5405" -i eth1 ------------------------------------------------------------ 10:51:56.748054 0a:00:27:00:00:02 > 08:00:27:26:45:5b, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 2.4.6.8.37357 > 192.168.58.10.5405: UDP, length 82 10:51:56.914846 08:00:27:26:45:5b > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.34890 > 2.4.6.8.5405: UDP, length 82 10:51:57.087184 0a:00:27:00:00:02 > 08:00:27:26:45:5b, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 2.4.6.8.37357 > 192.168.58.10.5405: UDP, length 82 10:51:57.137976 08:00:27:26:45:5b > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.34890 > 2.4.6.8.5405: UDP, length 82 10:51:57.339116 0a:00:27:00:00:02 > 08:00:27:26:45:5b, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 2.4.6.8.37357 > 192.168.58.10.5405: UDP, length 82 10:51:57.505602 08:00:27:26:45:5b > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.34890 > 2.4.6.8.5405: UDP, length 82 10:51:57.709958 0a:00:27:00:00:02 > 08:00:27:26:45:5b, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 2.4.6.8.37357 > 192.168.58.10.5405: UDP, length 82 10:51:57.728345 08:00:27:26:45:5b > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.34890 > 2.4.6.8.5405: UDP, length 82 10:51:57.962354 0a:00:27:00:00:02 > 08:00:27:26:45:5b, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 2.4.6.8.37357 > 192.168.58.10.5405: UDP, length 82 10:51:58.094887 08:00:27:26:45:5b > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.34890 > 2.4.6.8.5405: UDP, length 82 10:51:58.301512 0a:00:27:00:00:02 > 08:00:27:26:45:5b, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 2.4.6.8.37357 > 192.168.58.10.5405: UDP, length 82 10:51:58.301657 08:00:27:26:45:5b > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.34890 > 2.4.6.8.5405: UDP, length 82 10:51:58.576392 0a:00:27:00:00:02 > 08:00:27:26:45:5b, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 2.4.6.8.37357 > 192.168.58.10.5405: UDP, length 82 10:51:58.684891 08:00:27:26:45:5b > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.34890 > 2.4.6.8.5405: UDP, length 82 10:51:58.894880 08:00:27:26:45:5b > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.34890 > 2.4.6.8.5405: UDP, length 82 10:51:58.914985 0a:00:27:00:00:02 > 08:00:27:26:45:5b, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 2.4.6.8.37357 > 192.168.58.10.5405: UDP, length 82 10:51:59.168176 0a:00:27:00:00:02 > 08:00:27:26:45:5b, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 2.4.6.8.37357 > 192.168.58.10.5405: UDP, length 82 10:51:59.275154 08:00:27:26:45:5b > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.34890 > 2.4.6.8.5405: UDP, length 82 10:51:59.485189 08:00:27:26:45:5b > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.34890 > 2.4.6.8.5405: UDP, length 82 10:51:59.514365 0a:00:27:00:00:02 > 08:00:27:26:45:5b, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 2.4.6.8.37357 > 192.168.58.10.5405: UDP, length 82 10:51:59.775556 0a:00:27:00:00:02 > 08:00:27:26:45:5b, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 2.4.6.8.37357 > 192.168.58.10.5405: UDP, length 82 10:51:59.864912 08:00:27:26:45:5b > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.34890 > 2.4.6.8.5405: UDP, length 82 10:52:00.074440 08:00:27:26:45:5b > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.34890 > 2.4.6.8.5405: UDP, length 82 10:52:00.105667 0a:00:27:00:00:02 > 08:00:27:26:45:5b, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) Node 2 - tcpdump -envv "port 5405" -i eth1 ------------------------------------------------------------ 10:55:12.229883 0a:00:27:00:00:02 > 08:00:27:05:af:40, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 1.2.3.4.34890 > 192.168.58.10.5405: UDP, length 82 10:55:12.247341 08:00:27:05:af:40 > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.37357 > 1.2.3.4.5405: UDP, length 82 10:55:12.457267 0a:00:27:00:00:02 > 08:00:27:05:af:40, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 1.2.3.4.34890 > 192.168.58.10.5405: UDP, length 82 10:55:12.578838 08:00:27:05:af:40 > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.37357 > 1.2.3.4.5405: UDP, length 82 10:55:12.819008 0a:00:27:00:00:02 > 08:00:27:05:af:40, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 1.2.3.4.34890 > 192.168.58.10.5405: UDP, length 82 10:55:12.834251 08:00:27:05:af:40 > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.37357 > 1.2.3.4.5405: UDP, length 82 10:55:13.043014 0a:00:27:00:00:02 > 08:00:27:05:af:40, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 1.2.3.4.34890 > 192.168.58.10.5405: UDP, length 82 10:55:13.168621 08:00:27:05:af:40 > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.37357 > 1.2.3.4.5405: UDP, length 82 10:55:13.410025 0a:00:27:00:00:02 > 08:00:27:05:af:40, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 1.2.3.4.34890 > 192.168.58.10.5405: UDP, length 82 10:55:13.423383 08:00:27:05:af:40 > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.37357 > 1.2.3.4.5405: UDP, length 82 10:55:13.633936 0a:00:27:00:00:02 > 08:00:27:05:af:40, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 1.2.3.4.34890 > 192.168.58.10.5405: UDP, length 82 10:55:13.758722 08:00:27:05:af:40 > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.37357 > 1.2.3.4.5405: UDP, length 82 10:55:14.000246 0a:00:27:00:00:02 > 08:00:27:05:af:40, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 1.2.3.4.34890 > 192.168.58.10.5405: UDP, length 82 10:55:14.013566 08:00:27:05:af:40 > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.37357 > 1.2.3.4.5405: UDP, length 82 10:55:14.223766 0a:00:27:00:00:02 > 08:00:27:05:af:40, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto UDP (17), length 110) 1.2.3.4.34890 > 192.168.58.10.5405: UDP, length 82 10:55:14.350019 08:00:27:05:af:40 > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) 192.168.58.10.37357 > 1.2.3.4.5405: UDP, length 82 10:55:14.603364 08:00:27:05:af:40 > 0a:00:27:00:00:02, ethertype IPv4 (0x0800), length 124: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 110) Node 1 - corosync.conf ------------------------------- totem { version: 2 token: 5000 token_retransmits_before_loss_const: 20 join: 1000 consensus: 7500 vsftype: none max_messages: 20 clear_node_high_bit: yes secauth: off threads: 0 rrp_mode: passive interface { member { memberaddr: 192.168.58.10 } member { memberaddr: 2.4.6.8 } ringnumber: 0 bindnetaddr: 192.168.58.10 mcastport: 5405 } transport: udpu } amf { mode: disabled } service { ver: 0 name: pacemaker } aisexec { user: root group: root } logging { fileline: off to_logfile: yes to_syslog: yes debug: on logfile: /var/log/cluster/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off tags: enter|leave|trace1|trace2|trace3|trace4|trace6 } } Node 2 - corosync.conf ------------------------------- totem { version: 2 token: 5000 token_retransmits_before_loss_const: 20 join: 1000 consensus: 7500 vsftype: none max_messages: 20 clear_node_high_bit: yes secauth: off threads: 0 rrp_mode: passive interface { member { memberaddr: 192.168.58.10 } member { memberaddr: 1.2.3.4 } ringnumber: 0 bindnetaddr: 192.168.58.10 mcastport: 5405 } transport: udpu } amf { mode: disabled } service { ver: 0 name: pacemaker } aisexec { user: root group: root } logging { fileline: off to_logfile: yes to_syslog: yes debug: on logfile: /var/log/cluster/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off tags: enter|leave|trace1|trace2|trace3|trace4|trace6 } } node 1 - crm_mon -orVVVV1 -------------------------------------- crm_mon[17706]: 2012/08/20_11:07:35 info: main: Starting crm_mon crm_mon[17706]: 2012/08/20_11:07:35 info: unpack_config: Startup probes: enabled crm_mon[17706]: 2012/08/20_11:07:35 notice: unpack_config: On loss of CCM Quorum: Ignore crm_mon[17706]: 2012/08/20_11:07:35 info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 crm_mon[17706]: 2012/08/20_11:07:35 info: unpack_domains: Unpacking domains crm_mon[17706]: 2012/08/20_11:07:35 info: determine_online_status: Node zhatest-01.domain.com is online crm_mon[17706]: 2012/08/20_11:07:35 notice: unpack_rsc_op: Hard error - ZimbraServer_last_failure_0 failed with rc=5: Preventing ZimbraServer from re-starting on zhatest-01.domain.com crm_mon[17706]: 2012/08/20_11:07:35 notice: unpack_rsc_op: Hard error - ZimbraFS_last_failure_0 failed with rc=5: Preventing ZimbraFS from re-starting on zhatest-01.domain.com crm_mon[17706]: 2012/08/20_11:07:35 WARN: unpack_rsc_op: Processing failed op ZimbraFS_last_failure_0 on zhatest-01.domain.com: not installed (5) crm_mon[17706]: 2012/08/20_11:07:35 notice: unpack_rsc_op: Operation ClusterOVHFailover_last_failure_0 found resource ClusterOVHFailover active on zhatest-01.domain.com crm_mon[17706]: 2012/08/20_11:07:35 WARN: unpack_rsc_op: Processing failed op ClusterOVHFailover_monitor_120000 on zhatest-01.domain.com: unknown exec error (-2) ============ Last updated: Mon Aug 20 11:07:35 2012 Last change: Sun Aug 19 23:06:39 2012 via crmd on zhatest-01.domain.com Stack: openais Current DC: zhatest-01.domain.com - partition WITHOUT quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes 9 Resources configured. ============ Online: [ zhatest-01.domain.com ] OFFLINE: [ zhatest-02.domain.com ] Full list of resources: Resource Group: MySystem ClusterOVHFailover (ocf::btactic:OVHfailover): Started zhatest-01.domain.com FAILED ClusterIP (ocf::heartbeat:IPaddr2): Stopped ClusterHostRoute (ocf::btactic:OVHhostroute): Stopped DisableAlternativeRoute (ocf::btactic:OppositeRoute): Stopped ClusterDefaultRoute (ocf::btactic:OVHdefaultroute): Stopped Resource Group: MyZimbra ZimbraFS (ocf::heartbeat:Filesystem): Stopped ZimbraServer (ocf::btactic:zimbra): Stopped Master/Slave Set: ZimbraDataClone [ZimbraData] Masters: [ zhatest-01.domain.com ] Stopped: [ ZimbraData:1 ] Operations: * Node zhatest-01.domain.com: DisableAlternativeRoute: migration-threshold=1000000 + (58) monitor: interval=60000ms rc=0 (ok) + (65) stop: rc=0 (ok) ClusterHostRoute: migration-threshold=1000000 + (56) monitor: interval=30000ms rc=0 (ok) + (66) stop: rc=0 (ok) ClusterIP: migration-threshold=1000000 + (54) monitor: interval=30000ms rc=0 (ok) + (67) stop: rc=0 (ok) ZimbraServer: migration-threshold=1000000 + (8) probe: rc=5 (not installed) ClusterDefaultRoute: migration-threshold=1000000 + (60) monitor: interval=30000ms rc=0 (ok) + (63) stop: rc=0 (ok) crm_mon[17706]: 2012/08/20_11:07:35 info: get_failcount: ZimbraFS has failed INFINITY times on zhatest-01.domain.com ZimbraFS: migration-threshold=1000000 fail-count=1000000 + (24) start: rc=5 (not installed) + (26) stop: rc=0 (ok) ZimbraData:0: migration-threshold=1000000 + (25) monitor: interval=60000ms rc=8 (master) + (61) promote: rc=0 (ok) crm_mon[17706]: 2012/08/20_11:07:35 info: get_failcount: ClusterOVHFailover has failed 3 times on zhatest-01.domain.com ClusterOVHFailover: migration-threshold=1000000 fail-count=3 + (2) probe: rc=0 (ok) + (51) start: rc=0 (ok) + (52) monitor: interval=120000ms rc=-2 (unknown exec error) Failed actions: ZimbraServer_monitor_0 (node=zhatest-01.domain.com, call=8, rc=5, status=complete): not installed ZimbraFS_start_0 (node=zhatest-01.domain.com, call=24, rc=5, status=complete): not installed ClusterOVHFailover_monitor_120000 (node=zhatest-01.domain.com, call=52, rc=-2, status=Timed Out): unknown exec error node 2 - crm_mon -orVVVV1 -------------------------------------- crm_mon[14699]: 2012/08/20_11:13:14 info: main: Starting crm_mon crm_mon[14699]: 2012/08/20_11:13:14 info: unpack_config: Startup probes: enabled crm_mon[14699]: 2012/08/20_11:13:14 notice: unpack_config: On loss of CCM Quorum: Ignore crm_mon[14699]: 2012/08/20_11:13:14 info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 crm_mon[14699]: 2012/08/20_11:13:14 info: unpack_domains: Unpacking domains crm_mon[14699]: 2012/08/20_11:13:14 info: determine_online_status: Node zhatest-02.domain.com is online crm_mon[14699]: 2012/08/20_11:13:14 notice: unpack_rsc_op: Hard error - ZimbraServer_last_failure_0 failed with rc=5: Preventing ZimbraServer from re-starting on zhatest-02.domain.com crm_mon[14699]: 2012/08/20_11:13:14 notice: unpack_rsc_op: Operation ZimbraData:0_last_failure_0 found resource ZimbraData:0 active on zhatest-02.domain.com crm_mon[14699]: 2012/08/20_11:13:14 WARN: unpack_rsc_op: Processing failed op ClusterOVHFailover_monitor_120000 on zhatest-02.domain.com: unknown exec error (-2) crm_mon[14699]: 2012/08/20_11:13:14 notice: unpack_rsc_op: Hard error - ClusterOVHFailover_last_failure_0 failed with rc=5: Preventing ClusterOVHFailover from re-starting on zhatest-02.domain.com crm_mon[14699]: 2012/08/20_11:13:14 WARN: unpack_rsc_op: Processing failed op ClusterOVHFailover_last_failure_0 on zhatest-02.domain.com: not installed (5) crm_mon[14699]: 2012/08/20_11:13:14 info: native_add_running: resource ClusterOVHFailover isnt managed ============ Last updated: Mon Aug 20 11:13:14 2012 Last change: Sun Aug 19 23:07:10 2012 via crmd on zhatest-02.domain.com Stack: openais Current DC: zhatest-02.domain.com - partition WITHOUT quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes 9 Resources configured. ============ Online: [ zhatest-02.domain.com ] OFFLINE: [ zhatest-01.domain.com ] Full list of resources: Resource Group: MySystem ClusterOVHFailover (ocf::btactic:OVHfailover): Started zhatest-02.domain.com (unmanaged) FAILED ClusterIP (ocf::heartbeat:IPaddr2): Stopped ClusterHostRoute (ocf::btactic:OVHhostroute): Stopped DisableAlternativeRoute (ocf::btactic:OppositeRoute): Stopped ClusterDefaultRoute (ocf::btactic:OVHdefaultroute): Stopped Resource Group: MyZimbra ZimbraFS (ocf::heartbeat:Filesystem): Stopped ZimbraServer (ocf::btactic:zimbra): Stopped Master/Slave Set: ZimbraDataClone [ZimbraData] Slaves: [ zhatest-02.domain.com ] Stopped: [ ZimbraData:1 ] Operations: * Node zhatest-02.domain.com: DisableAlternativeRoute: migration-threshold=1000000 + (18) monitor: interval=60000ms rc=0 (ok) + (25) stop: rc=0 (ok) ClusterHostRoute: migration-threshold=1000000 + (16) monitor: interval=30000ms rc=0 (ok) + (26) stop: rc=0 (ok) ClusterIP: migration-threshold=1000000 + (14) monitor: interval=30000ms rc=0 (ok) + (27) stop: rc=0 (ok) ZimbraServer: migration-threshold=1000000 + (8) probe: rc=5 (not installed) ClusterDefaultRoute: migration-threshold=1000000 + (20) monitor: interval=30000ms rc=0 (ok) + (23) stop: rc=0 (ok) ZimbraData:0: migration-threshold=1000000 + (9) probe: rc=0 (ok) + (30) demote: rc=0 (ok) + (32) monitor: interval=50000ms rc=0 (ok) crm_mon[14699]: 2012/08/20_11:13:14 info: get_failcount: ClusterOVHFailover has failed INFINITY times on zhatest-02.domain.com ClusterOVHFailover: migration-threshold=1000000 fail-count=1000000 + (10) start: rc=0 (ok) + (12) monitor: interval=120000ms rc=-2 (unknown exec error) + (28) stop: rc=5 (not installed) Failed actions: ZimbraServer_monitor_0 (node=zhatest-02.domain.com, call=8, rc=5, status=complete): not installed ClusterOVHFailover_monitor_120000 (node=zhatest-02.domain.com, call=12, rc=-2, status=Timed Out): unknown exec error ClusterOVHFailover_stop_0 (node=zhatest-02.domain.com, call=28, rc=5, status=complete): not installed Specific details for this setup: -------------------------------------- * Although both nodes have the same 192.168.58.10 as you might see in the logs they live in two different networks... I mean... there's no problem about 192.168.58.10 begin repeated as far as I know. * 1.2.3.4 is the public ip for node 1 * 2.4.6.8 is the public ip for node 2 * 5405 udp is redirected from 1.2.3.4 public ip to its internal ip 192.168.58.10 * 5405 udp is redirected from 1.2.3.4 public ip to its internal ip 192.168.58.10 * Node 1 name is: zhatest-01.domain.com * Node 2 name is: zhatest-02.domain.com Long description --------------------- HA nodes doesn't seem to communicate to each other via corosync. That's what I infer from crm_mon output although I'm not an expert (One is online and the other one is offline... and in the other host it is the other way). I also infer that there's some kind of communication because of tcpdump seeing packages in both senses of the communication (although it's the first time I see a tcpdump output so I might be wrong too). Is there any tool/command that actually checks if corosync is communicating or not to the other node? And some other commands to debug it so that I can find where the problem is? Am I perhaps wrong and corosync is communicating fine and the problem is elsewhere? If you need more logs or details about the setup do not hesitate to ask for them. Thank you. -- -- Adrián Gibanel I.T. Manager +34 675 683 301 www.btactic.com Ens podeu seguir a/Nos podeis seguir en: i Abans d´imprimir aquest missatge, pensa en el medi ambient. El medi ambient és cosa de tothom. / Antes de imprimir el mensaje piensa en el medio ambiente. El medio ambiente es cosa de todos. AVIS: El contingut d'aquest missatge i els seus annexos és confidencial. Si no en sou el destinatari, us fem saber que està prohibit utilitzar-lo, divulgar-lo i/o copiar-lo sense tenir l'autorització corresponent. Si heu rebut aquest missatge per error, us agrairem que ho feu saber immediatament al remitent i que procediu a destruir el missatge . AVISO: El contenido de este mensaje y de sus anexos es confidencial. Si no es el destinatario, les hacemos saber que está prohibido utilizarlo, divulgarlo y/o copiarlo sin tener la autorización correspondiente. Si han recibido este mensaje por error, les agradeceríamos que lo hagan saber inmediatamente al remitente y que procedan a destruir el mensaje . _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss