> -----Original Message----- > From: linux-cluster-bounces@xxxxxxxxxx > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of AJ Lewis > Sent: Mittwoch, 4. August 2004 15:54 > To: Discussion of clustering software components including GFS > Subject: Re: [Linux-cluster] GFS 6.0 node without quorum > tries to fence > > > On Wed, Aug 04, 2004 at 08:12:51AM +0200, Schumacher, Bernd wrote: > > So, what I have learned from all answers is very bad news > for me. It > > seems, what happened is as expected by most of you. But this means: > > > > > ---------------------------------------------------------------------- > > - > > --- One single point of failure in one node can stop the > whole gfs. --- > > > -------------------------------------------------------------- > --------- > > > > The single point of failure is: > > The lancard specified in "nodes.ccs:ip_interfaces" stops working on > > one node. No matter if this node was master or slave. > > > > The whole gfs is stopped: > > The rest of the cluster seems to need time to form a new > cluster. The > > bad node does not need so much time for switching to > arbitrary mode. > > So the bad node has enough time to fence all other nodes, before it > > would be fenced by the new master. > > > > The bad node lives but it can not form a cluster. GFS is > not working. > > > > Now all other nodes will reboot. But after reboot they can not join > > the cluster, because they can not contact the bad node. The > lancard is > > still broken. GFS is not working. > > > > Did I miss something? > > Please tell me that I am wrong! > > Well, I guess I'm confused how the node with the bad lan card > can contact the fencing device to fence the other nodes. If > it can't communicate with the other nodes because it's NIC is > down, it can't contact the fencing device over that NIC > either, right? Or are you using some alternate transport to > contact the fencing device? There is a second admin Lan which is used for fencing. Could I probably use this second admin Lan for GFS Heartbeats too. Can I define two LAN-Cards in "nodes.ccs:ip_interfaces". If this works I would not have a single point of failure anymore. But the documentation seems not to allow this. I will test this tomorrow. > > > > -----Original Message----- > > > From: linux-cluster-bounces@xxxxxxxxxx > > > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of > > > Schumacher, Bernd > > > Sent: Dienstag, 3. August 2004 13:56 > > > To: linux-cluster@xxxxxxxxxx > > > Subject: [Linux-cluster] GFS 6.0 node without quorum > tries to fence > > > > > > > > > Hi, > > > I have three nodes oben, mitte and unten. > > > > > > Test: > > > I have disabled eth0 on mitte, so that mitte will be excluded. > > > > > > Result: > > > Oben and unten are trying to fence mitte and build a new > > > cluster. OK! But mitte tries to fence oben and unten. PROBLEM! > > > > > > Why can this happen? Mitte knows that it can not build a > > > cluster. See Logfile from mitte: "Have 1, need 2" > > > > > > Logfile from mitte: > > > Aug 3 12:53:17 mitte lock_gulmd_core[1845]: Client (oben) > > > expired Aug 3 12:53:17 mitte lock_gulmd_core[1845]: Core lost > > > slave quorum. Have 1, need 2. Switching to Arbitrating. Aug > > > 3 12:53:17 mitte > > > lock_gulmd_core[2120]: Gonna exec fence_node oben Aug 3 > > > 12:53:17 mitte > > > lock_gulmd_core[1845]: Forked [2120] fence_node oben with a 0 > > > pause. Aug 3 12:53:17 mitte fence_node[2120]: Performing > > > fence method, manual, on oben. > > > > > > cluster.ccs: > > > cluster { > > > name = "tom" > > > lock_gulm { > > > servers = ["oben", "mitte", "unten"] > > > } > > > } > > > > > > fence.ccs: > > > fence_devices { > > > manual_oben { > > > agent = "fence_manual" > > > } > > > manual_mitte ... > > > > > > > > > nodes.ccs: > > > nodes { > > > oben { > > > ip_interfaces { > > > eth0 = "192.168.100.241" > > > } > > > fence { > > > manual { > > > manual_oben { > > > ipaddr = "192.168.100.241" > > > } > > > } > > > } > > > } > > > mitte ... > > > > > > regards > > > Bernd Schumacher > > > > > > -- > > > > > > Linux-cluster@xxxxxxxxxx > > > http://www.redhat.com/mailman/listinfo/linux-> cluster > > > > > > > -- > > > > Linux-cluster@xxxxxxxxxx > > http://www.redhat.com/mailman/listinfo/linux-cluster > > -- > AJ Lewis Voice: 612-638-0500 > Red Hat Inc. E-Mail: alewis@xxxxxxxxxx > 720 Washington Ave. SE, Suite 200 > Minneapolis, MN 55414 > > Current GPG fingerprint = D9F8 EDCE 4242 855F A03D 9B63 F50C > 54A8 578C 8715 Grab the key at: > http://people.redhat.com/alewis/gpg.html or > one of the many > keyservers out there... -----Begin Obligatory Humorous > Quote---------------------------------------- > "In this time of war against Osama bin Laden and the > oppressive Taliban regime, we are thankful that OUR leader > isn't the spoiled son of a powerful politician from a wealthy > oil family who is supported by religious fundamentalists, > operates through clandestine organizations, has no respect > for the democratic electoral process, bombs innocents, and > uses war to deny people their civil liberties." --The > Boondocks -----End Obligatory Humorous > Quote------------------------------------------ >