Hello.
Thank you for your help; unfortunately, even with your clues, it
doesn't work. Here come my config with your suggestions applied :
# ifconfig
eth0 Link encap:Ethernet HWaddr 00:25:90:77:84:de
inet adr:37.59.18.208 Bcast:37.59.18.255
Masque:255.255.255.0
adr inet6: fe80::225:90ff:fe77:84de/64 Scope:Lien
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:562019 errors:0 dropped:0 overruns:0 frame:0
TX packets:1325787 errors:0 dropped:0 overruns:0
carrier:0
collisions:0 lg file transmission:1000
RX bytes:87811137 (83.7 MiB) TX bytes:221866389 (211.5
MiB)
Interruption:16 Mémoire:fbce0000-fbd00000
lo Link encap:Boucle locale
inet adr:127.0.0.1 Masque:255.0.0.0
adr inet6: ::1/128 Scope:Hôte
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:8139 errors:0 dropped:0 overruns:0 frame:0
TX packets:8139 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 lg file transmission:0
RX bytes:828638 (809.2 KiB) TX bytes:828638 (809.2 KiB)
tap0 Link encap:Ethernet HWaddr 7a:5c:2a:32:ee:30
inet adr:10.88.0.2 Bcast:10.88.0.255
Masque:255.255.255.0
adr inet6: fe80::785c:2aff:fe32:ee30/64 Scope:Lien
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:51971 errors:0 dropped:0 overruns:0 frame:0
TX packets:42362 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 lg file transmission:100
RX bytes:9387727 (8.9 MiB) TX bytes:8169107 (7.7 MiB)
cat corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2
secauth: off
interface {
member {
memberaddr: 10.88.0.1
}
member {
memberaddr: 10.88.0.2
}
ringnumber: 0
bindnetaddr: 10.88.0.0
mcastport: 5405
ttl: 1
}
transport: udpu
}
logging {
fileline: off
to_logfile: yes
to_syslog: yes
debug: on
logfile: /var/log/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
As you can see, I established an OpenVPN TAP connection in order
to have a VLAN between these distant machines (geographically
distant, like 400 km or 250 mi, to improve cluster reliability) to
ensure they share a subnetwork. I checked connectivity, and the
node are able to join each other via this VPN (at least via SSH).
I also disabled the firewall by the time I solve this problem, as
there are no content served and no established connection with our
facility, and to make sure firewall is not interfering.
# Node 1
iptables -nvL
Chain INPUT (policy ACCEPT 143K packets, 24M bytes)
pkts bytes target prot opt in out
source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out
source destination
Chain OUTPUT (policy ACCEPT 159K packets, 28M bytes)
pkts bytes target prot opt in out
source destination
# Node 2
iptables -nvL
Chain INPUT (policy ACCEPT 144K packets, 26M bytes)
pkts bytes target prot opt in out
source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out
source destination
Chain OUTPUT (policy ACCEPT 125K packets, 23M bytes)
pkts bytes target prot opt in out
source destination
Unfortunately, it still doesn't work.
Nevertheless, instances stopped filling their logs with
connectivity warnings; instead, I found these messages after
making the changes you suggested :
# Node 1
Jun 07 10:24:07 corosync [MAIN ] Corosync Cluster Engine
('1.4.2'): started and ready to provide service.
Jun 07 10:24:07 corosync [MAIN ] Corosync built-in features: nss
Jun 07 10:24:07 corosync [MAIN ] Successfully read main
configuration file '/etc/corosync/corosync.conf'.
Jun 07 10:24:07 corosync [TOTEM ] Initializing transport (UDP/IP
Unicast).
Jun 07 10:24:07 corosync [TOTEM ] Initializing transmit/receive
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jun 07 10:24:07 corosync [TOTEM ] The network interface
[10.88.0.1] is now up.
Jun 07 10:24:07 corosync [SERV ] Service engine loaded: corosync
extended virtual synchrony service
Jun 07 10:24:07 corosync [SERV ] Service engine loaded: corosync
configuration service
Jun 07 10:24:07 corosync [SERV ] Service engine loaded: corosync
cluster closed process group service v1.01
Jun 07 10:24:07 corosync [SERV ] Service engine loaded: corosync
cluster config database access v1.01
Jun 07 10:24:07 corosync [SERV ] Service engine loaded: corosync
profile loading service
Jun 07 10:24:07 corosync [SERV ] Service engine loaded: corosync
cluster quorum service v0.1
Jun 07 10:24:07 corosync [MAIN ] Compatibility mode set to
whitetank. Using V1 and V2 of the synchronization engine.
Jun 07 10:24:07 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 07 10:24:07 corosync [CPG ] chosen downlist: sender r(0)
ip(10.88.0.1) ; members(old:0 left:0)
Jun 07 10:24:07 corosync [MAIN ] Completed service
synchronization, ready to provide service.
Jun 07 10:24:07 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 07 10:24:08 corosync [CPG ] chosen downlist: sender r(0)
ip(10.88.0.1) ; members(old:1 left:0)
Jun 07 10:24:08 corosync [MAIN ] Completed service
synchronization, ready to provide service.
# Node 2
Jun 07 10:23:51 corosync [MAIN ] Corosync Cluster Engine
('1.4.2'): started and ready to provide service.
Jun 07 10:23:51 corosync [MAIN ] Corosync built-in features: nss
Jun 07 10:23:51 corosync [MAIN ] Successfully read main
configuration file '/etc/corosync/corosync.conf'.
Jun 07 10:23:51 corosync [TOTEM ] Initializing transport (UDP/IP
Unicast).
Jun 07 10:23:51 corosync [TOTEM ] Initializing transmit/receive
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jun 07 10:23:51 corosync [TOTEM ] The network interface
[10.88.0.2] is now up.
Jun 07 10:23:51 corosync [SERV ] Service engine loaded: corosync
extended virtual synchrony service
Jun 07 10:23:51 corosync [SERV ] Service engine loaded: corosync
configuration service
Jun 07 10:23:51 corosync [SERV ] Service engine loaded: corosync
cluster closed process group service v1.01
Jun 07 10:23:51 corosync [SERV ] Service engine loaded: corosync
cluster config database access v1.01
Jun 07 10:23:51 corosync [SERV ] Service engine loaded: corosync
profile loading service
Jun 07 10:23:51 corosync [SERV ] Service engine loaded: corosync
cluster quorum service v0.1
Jun 07 10:23:51 corosync [MAIN ] Compatibility mode set to
whitetank. Using V1 and V2 of the synchronization engine.
Jun 07 10:23:51 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 07 10:23:51 corosync [CPG ] chosen downlist: sender r(0)
ip(10.88.0.2) ; members(old:0 left:0)
Jun 07 10:23:51 corosync [MAIN ] Completed service
synchronization, ready to provide service.
Jun 07 10:23:59 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 07 10:24:00 corosync [CPG ] chosen downlist: sender r(0)
ip(10.88.0.1) ; members(old:1 left:0)
Jun 07 10:24:00 corosync [MAIN ] Completed service
synchronization, ready to provide service.
In addition, when I check listening services, I get this :
netstat -lnptu
Connexions Internet actives (seulement serveurs)
Proto Recv-Q Send-Q Adresse locale Adresse
distante Etat PID/Program name
tcp 0 0 0.0.0.0:80
0.0.0.0:* LISTEN 3012/nginx
tcp 0 0 127.0.0.1:53
0.0.0.0:* LISTEN 17881/named
tcp 0 0 0.0.0.0:22
0.0.0.0:* LISTEN 16480/sshd
tcp 0 0 127.0.0.1:953
0.0.0.0:* LISTEN 17881/named
tcp6 0 0 ::1:53
:::* LISTEN 17881/named
tcp6 0 0 :::22
:::* LISTEN 16480/sshd
tcp6 0 0 ::1:953
:::* LISTEN 17881/named
udp 0 0 0.0.0.0:58265
0.0.0.0:* 3630/corosync
udp 0 0 127.0.0.1:53
0.0.0.0:* 17881/named
udp 0 0 127.0.0.1:921
0.0.0.0:* 17788/lwresd
udp 0 0 0.0.0.0:35009
0.0.0.0:* 3630/corosync
udp 0 0 10.88.0.2:5405
0.0.0.0:* 3630/corosync
udp6 0 0 ::1:53
:::* 17881/named
Is it me or were they able to open an listening socket, detect
each other and are waiting for prividing services? In this case,
why crm_mon --one-shot -V still responds "Connection to cluster
failed: connection failed"?
Still a little config issue, I assume, but where?
Thank you in advance.
Regards.
Le 07/06/2012 10:17, Jan Friesse a
écrit :
This is
expected behavior, and even more makes me sure that whole problem
is really hidden in nonexisting local member addr in your config.
Honza
David Guyot napsal(a):
Hello again, everybody.
I just noticed that, when I tried to set secauth to off, during
the
period of time in which one node accepted secured connections
one the
other unsecured connections, the network fault message were
replaced by
these :
Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid
digest...
ignoring.
Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
If this is relevant...
Thank you in advance.
Regards.
Le 06/06/2012 17:05, David Guyot a écrit :
Hello, everybody.
I'm trying to establish a 2-node Debian Squeeze x64 cluster
with
Corosync and Pacemaker, but I'm hanged with a strange issue :
despite a
lot of UDP chatting between the nodes (so network is OK but),
each
Corosync instance seems to ignore each other : the other node
is never
detected, and crm_mon --one-shot -V only says "Connection to
cluster
failed: connection failed". But the strangest in there is that
both
Corosync nodes are filling their logs with error messages
saying "Totem
is unable to form a cluster because of an operating system or
network
fault. The most common cause of this message is that the local
firewall
is configured improperly.". I tcpdumped all traffic between
the hosts,
and I have 2-way traffic between them. I tried to use
backports versions
of all Corosync- and Pacemaker-related packages, without
improvement.
I must add that, due to my hosting company network policy, I
was forced
to use UPD-Unicast instead of multicast, because multicast is
blocked.
Here comes my config :
corosync.conf :
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2
secauth: on
interface {
member {
memberaddr: 176.31.238.131
}
ringnumber: 0
bindnetaddr: 37.59.18.208
mcastport: 5405
ttl: 1
}
transport: udpu
}
logging {
fileline: off
to_logfile: yes
to_syslog: yes
debug: on
logfile: /var/log/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
Log messages :
Jun 06 16:35:14 corosync [MAIN ] Corosync Cluster Engine
('1.4.2'):
started and ready to provide service.
Jun 06 16:35:14 corosync [MAIN ] Corosync built-in features:
nss
Jun 06 16:35:14 corosync [MAIN ] Successfully read main
configuration
file '/etc/corosync/corosync.conf'.
Jun 06 16:35:14 corosync [TOTEM ] Initializing transport
(UDP/IP Unicast).
Jun 06 16:35:14 corosync [TOTEM ] Initializing
transmit/receive
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jun 06 16:35:14 corosync [TOTEM ] The network interface
[37.59.18.208]
is now up.
Jun 06 16:35:14 corosync [SERV ] Service engine loaded:
corosync
extended virtual synchrony service
Jun 06 16:35:14 corosync [SERV ] Service engine loaded:
corosync
configuration service
Jun 06 16:35:14 corosync [SERV ] Service engine loaded:
corosync
cluster closed process group service v1.01
Jun 06 16:35:14 corosync [SERV ] Service engine loaded:
corosync
cluster config database access v1.01
Jun 06 16:35:14 corosync [SERV ] Service engine loaded:
corosync
profile loading service
Jun 06 16:35:14 corosync [SERV ] Service engine loaded:
corosync
cluster quorum service v0.1
Jun 06 16:35:14 corosync [MAIN ] Compatibility mode set to
whitetank.
Using V1 and V2 of the synchronization engine.
Jun 06 16:35:23 corosync [TOTEM ] Totem is unable to form a
cluster
because of an operating system or network fault. The most
common cause
of this message is that the local firewall is configured
improperly.
Jun 06 16:35:25 corosync [TOTEM ] Totem is unable to form a
cluster
because of an operating system or network fault. The most
common cause
of this message is that the local firewall is configured
improperly.
Jun 06 16:35:27 corosync [TOTEM ] Totem is unable to form a
cluster
because of an operating system or network fault. The most
common cause
of this message is that the local firewall is configured
improperly.
Jun 06 16:35:30 corosync [TOTEM ] Totem is unable to form a
cluster
because of an operating system or network fault. The most
common cause
of this message is that the local firewall is configured
improperly.
# uname -a
Linux Vindemiatrix 3.2.13-grsec-xxxx-grs-ipv6-64 #1 SMP Thu
Mar 29
09:48:59 UTC 2012 x86_64 GNU/Linux
# iptables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source
destination
0 0 ACCEPT all -- tun0 * 0.0.0.0/0
0.0.0.0/0
0 0 ACCEPT all -- lo * 0.0.0.0/0
0.0.0.0/0
0 0 tcp -- * * 0.0.0.0/0
0.0.0.0/0 tcp dpt:22 state NEW recent: SET name: SSH
side: source
0 0 LOGDROP tcp -- * * 0.0.0.0/0
0.0.0.0/0 tcp dpt:22 state NEW recent: UPDATE
seconds: 60
hit_count: 6 TTL-Match name: SSH side: source
0 0 ACCEPT tcp -- * * 0.0.0.0/0
0.0.0.0/0 tcp dpt:22 state NEW
0 0 LOGDROP tcp -- * * 0.0.0.0/0
0.0.0.0/0 tcp flags:0x17/0x02 multiport dports
80,443 #conn/32
100
1 48 ACCEPT tcp -- * * 0.0.0.0/0
0.0.0.0/0 tcp flags:0x17/0x02 multiport dports
80,443
0 0 ACCEPT tcp -- eth0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:21 flags:0x17/0x02 limit: avg
5/min burst 50
recent: SET name: FTP side: source
0 0 LOGDROP tcp -- eth0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:21 flags:0x17/0x02 recent: UPDATE
seconds:
60 hit_count: 6 TTL-Match name: FTP side: source
0 0 ACCEPT tcp -- eth0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:21 flags:0x17/0x02
0 0 ACCEPT tcp -- eth0 * 0.0.0.0/0
0.0.0.0/0 tcp dpts:50000:50500 state
RELATED,ESTABLISHED
0 0 ACCEPT tcp -- eth0 * 176.31.238.131
0.0.0.0/0 tcp dpt:1194
11867 3145K ACCEPT udp -- * * 0.0.0.0/0
0.0.0.0/0 udp dpt:5405 /* Corosync */
35 9516 ACCEPT all -- eth0 * 0.0.0.0/0
0.0.0.0/0 state NEW limit: avg 30/sec burst 200
0 0 LOGDROP tcp -- eth0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:80 STRING match
"w00tw00t.at.ISC.SANS." ALGO
name bm TO 65535
0 0 ACCEPT icmp -- * * 0.0.0.0/0
0.0.0.0/0 limit: avg 10/sec burst 5
0 0 LOGDROP icmp -- * * 0.0.0.0/0
0.0.0.0/0
1031 70356 ACCEPT all -- * * 0.0.0.0/0
0.0.0.0/0 state RELATED,ESTABLISHED
3 132 LOGDROP all -- * * 0.0.0.0/0
0.0.0.0/0
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source
destination
0 0 LOGDROP all -- * * 0.0.0.0/0
0.0.0.0/0
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source
destination
0 0 ACCEPT all -- * tun0 0.0.0.0/0
0.0.0.0/0
0 0 ACCEPT all -- * lo 0.0.0.0/0
0.0.0.0/0
0 0 LOGDROP tcp -- * eth0 0.0.0.0/0
0.0.0.0/0 tcp dpt:80 owner UID match 33
0 0 LOGDROP udp -- * eth0 0.0.0.0/0
0.0.0.0/0 udp dpt:80 owner UID match 33
0 0 LOGDROP tcp -- * eth0 0.0.0.0/0
0.0.0.0/0 tcp dpt:443 owner UID match 33
0 0 LOGDROP udp -- * eth0 0.0.0.0/0
0.0.0.0/0 udp dpt:443 owner UID match 33
0 0 ACCEPT tcp -- * eth0 0.0.0.0/0
176.31.238.131 tcp dpt:1194
11871 3146K ACCEPT udp -- * * 0.0.0.0/0
0.0.0.0/0 udp dpt:5405 /* Corosync */
0 0 ACCEPT tcp -- * * 0.0.0.0/0
0.0.0.0/0 tcp dpt:22
0 0 ACCEPT tcp -- * * 0.0.0.0/0
0.0.0.0/0 tcp dpt:25
0 0 ACCEPT tcp -- * eth0 0.0.0.0/0
0.0.0.0/0 tcp dpt:43
0 0 ACCEPT tcp -- * eth0 0.0.0.0/0
0.0.0.0/0 tcp dpt:53
0 0 ACCEPT udp -- * eth0 0.0.0.0/0
0.0.0.0/0 udp dpt:53
0 0 ACCEPT tcp -- * eth0 0.0.0.0/0
0.0.0.0/0 tcp dpt:80
0 0 ACCEPT udp -- * eth0 0.0.0.0/0
0.0.0.0/0 udp dpt:123
0 0 ACCEPT tcp -- * eth0 0.0.0.0/0
0.0.0.0/0 tcp dpt:443
0 0 ACCEPT tcp -- * eth0 0.0.0.0/0
0.0.0.0/0 tcp dpt:873
11 924 ACCEPT icmp -- * * 0.0.0.0/0
0.0.0.0/0
1071 712K ACCEPT all -- * * 0.0.0.0/0
0.0.0.0/0 state RELATED,ESTABLISHED
67 14013 LOGDROP all -- * * 0.0.0.0/0
0.0.0.0/0
Chain LOGDROP (12 references)
pkts bytes target prot opt in out source
destination
57 11655 LOG all -- * * 0.0.0.0/0
0.0.0.0/0 limit: avg 1/sec burst 5 LOG flags 0 level
5 prefix
`iptables rejected: '
70 14145 DROP all -- * * 0.0.0.0/0
0.0.0.0/0
# corosync -v
Corosync Cluster Engine, version '1.4.2'
Copyright (c) 2006-2009 Red Hat, Inc.
I've been trying to solve this problem the 2 last days,
without any
result. Any help welcome.
Thank ou in advance!
Regards.
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss
|