Hi! I am very sorry that I did not mention that when I am testing different suggestions on solving this, I always temporarily disable the firewall. Then turning it back on after the testing. Thank you very much on the tip for the tshark! I will post the output as as soon as I get a maintenance window to restart the cman service. With regards to openais, it is still off on both servers. Should it be turned on on boot? I am very sorry but I haven't read it in the manuals that it should be "on". --- On Sat, 7/18/09, Marc - A. Dahlhaus <mad@xxxxxx> wrote: > From: Marc - A. Dahlhaus <mad@xxxxxx> > Subject: Re: Starting two-node cluster with only one node > To: "linux clustering" <linux-cluster@xxxxxxxxxx> > Date: Saturday, 18 July, 2009, 10:02 PM > Hello, > > as your cluster worked well on centos 5.2 the networking > hardware > components couldn't be the culprit in this case but is > still think that > it is an cluster communication related problem. > > It could be your iptables ruleset... Try to disable the > firewall and > check again... > > You can use tshark to check this as well in this case by > using something > like this: > > tshark -i <interface cluster is useing> -f 'host > <multicast-ip cluster > is useing>' -V | less > > Have you checked that openais is still chkconfig off after > your upgrade? > > Abed-nego G. Escobal, Jr. schrieb: > > Thanks for giving the pointers! > > > > uname -r on both nodes > > > > 2.6.18-128.1.16.el5 > > > > on node01 > > > > rpm -q cman gfs-utils kmod-gfs modcluster ricci luci > cluster-snmp iscsi-initiator-utils lvm2-cluster openais > oddjob rgmanager > > cman-2.0.98-2chrissie > > gfs-utils-0.1.18-1.el5 > > kmod-gfs-0.1.23-5.el5_2.4 > > kmod-gfs-0.1.31-3.el5 > > modcluster-0.12.1-2.el5.centos > > ricci-0.12.1-7.3.el5.centos.1 > > luci-0.12.1-7.3.el5.centos.1 > > cluster-snmp-0.12.1-2.el5.centos > > iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1 > > lvm2-cluster-2.02.40-7.el5 > > openais-0.80.3-22.el5_3.8 > > oddjob-0.27-9.el5 > > rgmanager-2.0.46-1.el5.centos.3 > > > > on node02 > > > > rpm -q cman gfs-utils kmod-gfs modcluster ricci luci > cluster-snmp iscsi-initiator-utils lvm2-cluster openais > oddjob rgmanager > > cman-2.0.98-2chrissie > > gfs-utils-0.1.18-1.el5 > > kmod-gfs-0.1.31-3.el5 > > modcluster-0.12.1-2.el5.centos > > ricci-0.12.1-7.3.el5.centos.1 > > luci-0.12.1-7.3.el5.centos.1 > > cluster-snmp-0.12.1-2.el5.centos > > iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1 > > lvm2-cluster-2.02.40-7.el5 > > openais-0.80.3-22.el5_3.8 > > oddjob-0.27-9.el5 > > rgmanager-2.0.46-1.el5.centos.3 > > > > I used http://knowledgelayer.softlayer.com/questions/443/GFS+howto > to configure my cluster. When it was still on 5.2 the > cluster worked, but after the recent update to 5.3, it > broke. > > > > On one of the threads that I have found in the > archive, it states that there is a problem with the most > current official version of cman, bug id 485026. I replaced > the most current cman package with cman-2.0.98-2chrissie > because I tested if this was my problem, seems not so I will > be moving back to the official package. > > I also found on another thread that openais was the > culprit, changed it back to openais-0.80.3-15.el5 even > though the change log indicates a lot of bug fixes were done > on the most current official package. After doing it, it > still did not work. I tried clean_start="1" with caution. I > unmounted the iscsi then started cman but still it did not > work. The most recent is post_join_delay="-1", I did not > noticed that there was a man for fenced, which is much safer > than clean_start="1" but still it did not fixed it. The man > pages that I have read over and over again is cman and > cluster.conf. Some pages in the online manual is somewhat > not suitable for my situation because I do not have X > installed on the machines and some pages in the online > manual used system-config-cluster. > > > > As I understand in the online manual and FAQ, qdisk is > not required if I have two_nodes="1" so I did not create > any. I have removed the fence_daemon tag since I only used > it for trying the solutions that were suggested. The hosts > are present in each others hosts with correct ips. > > > > > > The ping results > > > > ping node02.company.com > > > > --- node01.company.com ping statistics --- > > 10 packets transmitted, 10 received, 0% packet loss, > time 8999ms > > rtt min/avg/max/mdev = 0.010/0.016/0.034/0.007 ms > > > > ping node01.company.com > > > > --- node01.company.com ping statistics --- > > 10 packets transmitted, 10 received, 0% packet loss, > time 9003ms > > rtt min/avg/max/mdev = 0.341/0.668/1.084/0.273 ms > > > > According to the people in the data center, the switch > supports multicast communication on all ports that are used > for cluster communication because they are in the same > VLAN. > > > > For the logs, I will sending fresh logs as soon as > possible. Currently I have not enough time window to bring > down the machine. > > > > For the wireshark, I will be reading the man pages on > how to use it. > > > > Please advise if any other information is needed to > solve this. I am very grateful for the very detailed > pointers. Thank you very much! > > > > > > --- On Fri, 7/17/09, Marc - A. Dahlhaus [ > Administration | Westermann GmbH ] <mad@xxxxxx> wrote: > > > > > >> From: Marc - A. Dahlhaus [ Administration | > Westermann GmbH ] <mad@xxxxxx> > >> Subject: Re: Starting two-node > cluster with only one node > >> To: "linux clustering" <linux-cluster@xxxxxxxxxx> > >> Date: Friday, 17 July, 2009, 5:56 PM > >> Hello, > >> > >> > >> can you give us some hard facts on what versions > of > >> cluster-suite > >> packages you are using in your environment and > also the > >> related logs? > >> > >> Have you read the corresponding parts of the > cluster suites > >> manual, man > >> pages, FAQ and also searched the list-archives for > similar > >> problems > >> already? If not -> do it, there are may good > hints to > >> find there. > >> > >> > >> The nodes find each other and create a cluster > very fast IF > >> they can > >> talk to each other. As no cluster networking is > involved in > >> fencing a > >> remote node if the fencing node by itself is > quorate this > >> could be your > >> problem. > >> > >> You should change to fence_manual and switch back > to your > >> real fencing > >> devices after you have debuged your problem. Also > get rid > >> of the > >> <fence_daemon ... /> tag in your > cluster.conf as > >> fenced does the right > >> thing by default if the remaining configuration is > right > >> and now it is > >> just hiding a part of the problem. > >> > >> Also the 5 minute break on cman start smells like > a > >> DNS-lookup problem > >> or other network related problem to me. > >> > >> Here is a short check-list to be sure the nodes > can talk to > >> each other: > >> > >> Can the individual nodes ping each other? > >> > >> Can the individual nodes dns-lookup the other > node-names > >> (which you used > >> in your cluster.conf)? (Try to add them to your > etc/hosts > >> file, that way > >> you have a working cluster even if your dns-system > is going > >> on > >> vacation.) > >> > >> Is your switch allowing multicast communication on > all > >> ports that are > >> used for cluster communication? (This is a > prerequisite for > >> openais / > >> corosync based cman which would be anything >= > RHEL 5. > >> Search the > >> archives on this if you need more info...) > >> > >> Can you trace (eg. with wiresharks tshark) > incoming > >> cluster > >> communication from remote nodes? (If you don't > changed your > >> fencing to > >> fence_manual your listening system will get fenced > before > >> you can get > >> any useful information out of it. Try with and > without > >> active firewall.) > >> > >> If all above could be answered with "yes" your > cluster > >> should form just > >> fine. You could try to add a qdisk-device as > tiebreaker > >> after that and > >> test it just to be sure you have a working last > man > >> standing setup... > >> > >> Hope that helps, > >> > >> Marc > >> > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > "Try the new FASTER Yahoo! Mail. Experience it today at http://ph.mail.yahoo.com" -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster