i want to improve my basic skill in linux such i want to learn how to configure openoffice , any media player in RedHat 5 Version please help on these topics. Harvinder Singh S/O Baldev Raj, VPO Barwa Teh. Anandpur Sahib, Dist. Ropar, PunjabE-Mail ID:- jmd_singhsaini@xxxxxxxxx --- On Fri, 17/2/12, Jan Huijsmans <Jan.Huijsmans@xxxxxxxxxxxxxx> wrote: > From: Jan Huijsmans <Jan.Huijsmans@xxxxxxxxxxxxxx> > Subject: Re: Cluster stability with missing qdisk > To: "linux clustering" <linux-cluster@xxxxxxxxxx> > Date: Friday, 17 February, 2012, 2:24 AM > Hi, > > > Please stay on-list or call Red Hat Support. > > Whoops, my bad, it's back on-list again. (reply without > checking to didn't help) > > > On 02/16/2012 04:50 AM, Jan Huijsmans wrote: > >>>> In the clusters we have we use a qdisk to > determine which node had the quorum, in case of a split > brain situation. > >> > >>>> This is working great... until the qdisk > itself is hit due to problems with the SAN. Is there a way > to have a stable cluster, > >>>> with qdisks, where the absence of (1) qdisk > won't kill the cluster all together. At this moment, with > the setup with 1 qdisk, > >>>> the cluster is totally depending on the > availability of the qdisk, while, IMHO, it should be > expendable. > >> > >>> What kind of problems are you trying to avoid? > >> > >>> 1) I/O errors -> disk died: > >> > >>> solution: set max_error_cycles to something > nonzero (1? 2?), and qdiskd > >>> will then exit on the host where the problems > are occurring when I/O > >>> errors are received > >> > >> We now have the interval for the qdisk set to 3 and > tko to 50. So the status is > >> updated every 3 seconds and it's allowed to fail 50 > times. > >> > >> Will the max_error_cycles cause the qdisk tries to > fail when it didn't respond on > >> time? If so, what is it's relation with the > interval and tko? > >> > >> Is this an option that can be used with the > clustering suite in RHEL 5.6 software stack? > >> > >>> 2) Long I/O hangs (e.g. path fail-over) > >> > >>> solution: current 3.1.x / 3.2.x differentiates > between I/O hangs and I/O > >>> errors, so hangs (e.g. due to path fail-over) > no longer cause reboots. > >> > >> We have seen I/O hang of over 350 seconds at the > worst times. (it's now< 10 seconds) > >> We see discarded frames on the SAN, so it's > explainable. Only the system has > >> 4 paths, 2 on 1 fabric and 2 on the other. The > default failure detection time is > >> 60 seconds in the RedHat default set-up. (which > wasn't changed) > > > > You can hang forever with the new upstream feature as > long as the nodes > > can communicate. > > This is usefull. Is this available in the rhn channels for > the 5.6 RHEL release or > is there an upgrade needed. > > >> Our setup has 3 locations, datacenters A and B and > quorum location C. > >> The last location is used by the SAN (IBM SVC/V7000 > units) to determine > >> which datacenter (A or B) has access to C, when > there is no communication > >> possible between both datacenters. > > > > For starters, set master_wins to '1' and don't use > heuristics. > > I'll see when I can test this. There was 1 cluster I had to > add heuristics to ensure > logging from the evicted node before it was reset. (It's > very irritating when a node > is evicted without a logged cause) > > >> I would like to migrate the qdisk to this location, > so we have the same setup > >> as with the SAN. The main problem is the failure of > the quorum location C. > > > Sure. > > >> When we move the qdisk there and it fails, the > cluster will fail on the qdisk, > >> when it should be able to function properly, as > both nodes are up and are > >> able to communicate with each other. > > > > Setting max_error_cycles to 1 will cause I/O errors to > remove the quorum > > disk on the host. > > > The new upstream feature will prevent a hang from > causing evictions. > > > There is no method to 'ignore' eviction notices. > > I don't want to ignore it, I don't want to get them when the > nodes can reach each > other and both could do the job they need to. > > >> On the SAN setup this is solved with 3 'qdisks', > with one on each location. (A, B > >> and C) When C fails, there are still 2 qdisks > available, so the cluster keeps > >> functioning. > > > Qdiskd doesn't work deterministically in replicated > environments. > > I was thinking, is it possible to use an MD device with 3 > mirror copies as qdisk > device? This would give the same functionality with only 1 > qdisk device. > > >> The problem that I'm trying to solve is the > complete failure of the qdisk taking > >> down a perfectly correct operating cluster. We have > to guard against a split brain > >> situation, but at the moment the costs of the > qdisks are to high. (all clusters > >> are now limited to 1 node to prevent failures due > to the qdisk problems) > > > You might not need a quorum disk at all. > > > A quorum disk doesn't obviate the need for fencing to > complete in > > environments where you have a streched cluster. > E.g. even when you have > > sites A and C, when B dies, it will need to be > fenced. This will fail, > > because the site is not available. > > That was what's bothering me on the design of the current > cluster set-up. > However, when both nodes could reach the qdisk and not each > other > via LAN, they evicted each other. (which was executed as > soon as > the LAN was back up...) > > > Why don't you take a look at these and file a ticket > with Red Hat Support: > > > https://access.redhat.com/kb/docs/DOC-53348 > > https://access.redhat.com/kb/docs/DOC-58412 > > I'll take a look at it. > > -- Jan > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster