Hi, > Please stay on-list or call Red Hat Support. Whoops, my bad, it's back on-list again. (reply without checking to didn't help) > On 02/16/2012 04:50 AM, Jan Huijsmans wrote: >>>> In the clusters we have we use a qdisk to determine which node had the quorum, in case of a split brain situation. >> >>>> This is working great... until the qdisk itself is hit due to problems with the SAN. Is there a way to have a stable cluster, >>>> with qdisks, where the absence of (1) qdisk won't kill the cluster all together. At this moment, with the setup with 1 qdisk, >>>> the cluster is totally depending on the availability of the qdisk, while, IMHO, it should be expendable. >> >>> What kind of problems are you trying to avoid? >> >>> 1) I/O errors -> disk died: >> >>> solution: set max_error_cycles to something nonzero (1? 2?), and qdiskd >>> will then exit on the host where the problems are occurring when I/O >>> errors are received >> >> We now have the interval for the qdisk set to 3 and tko to 50. So the status is >> updated every 3 seconds and it's allowed to fail 50 times. >> >> Will the max_error_cycles cause the qdisk tries to fail when it didn't respond on >> time? If so, what is it's relation with the interval and tko? >> >> Is this an option that can be used with the clustering suite in RHEL 5.6 software stack? >> >>> 2) Long I/O hangs (e.g. path fail-over) >> >>> solution: current 3.1.x / 3.2.x differentiates between I/O hangs and I/O >>> errors, so hangs (e.g. due to path fail-over) no longer cause reboots. >> >> We have seen I/O hang of over 350 seconds at the worst times. (it's now< 10 seconds) >> We see discarded frames on the SAN, so it's explainable. Only the system has >> 4 paths, 2 on 1 fabric and 2 on the other. The default failure detection time is >> 60 seconds in the RedHat default set-up. (which wasn't changed) > > You can hang forever with the new upstream feature as long as the nodes > can communicate. This is usefull. Is this available in the rhn channels for the 5.6 RHEL release or is there an upgrade needed. >> Our setup has 3 locations, datacenters A and B and quorum location C. >> The last location is used by the SAN (IBM SVC/V7000 units) to determine >> which datacenter (A or B) has access to C, when there is no communication >> possible between both datacenters. > > For starters, set master_wins to '1' and don't use heuristics. I'll see when I can test this. There was 1 cluster I had to add heuristics to ensure logging from the evicted node before it was reset. (It's very irritating when a node is evicted without a logged cause) >> I would like to migrate the qdisk to this location, so we have the same setup >> as with the SAN. The main problem is the failure of the quorum location C. > Sure. >> When we move the qdisk there and it fails, the cluster will fail on the qdisk, >> when it should be able to function properly, as both nodes are up and are >> able to communicate with each other. > > Setting max_error_cycles to 1 will cause I/O errors to remove the quorum > disk on the host. > The new upstream feature will prevent a hang from causing evictions. > There is no method to 'ignore' eviction notices. I don't want to ignore it, I don't want to get them when the nodes can reach each other and both could do the job they need to. >> On the SAN setup this is solved with 3 'qdisks', with one on each location. (A, B >> and C) When C fails, there are still 2 qdisks available, so the cluster keeps >> functioning. > Qdiskd doesn't work deterministically in replicated environments. I was thinking, is it possible to use an MD device with 3 mirror copies as qdisk device? This would give the same functionality with only 1 qdisk device. >> The problem that I'm trying to solve is the complete failure of the qdisk taking >> down a perfectly correct operating cluster. We have to guard against a split brain >> situation, but at the moment the costs of the qdisks are to high. (all clusters >> are now limited to 1 node to prevent failures due to the qdisk problems) > You might not need a quorum disk at all. > A quorum disk doesn't obviate the need for fencing to complete in > environments where you have a streched cluster. E.g. even when you have > sites A and C, when B dies, it will need to be fenced. This will fail, > because the site is not available. That was what's bothering me on the design of the current cluster set-up. However, when both nodes could reach the qdisk and not each other via LAN, they evicted each other. (which was executed as soon as the LAN was back up...) > Why don't you take a look at these and file a ticket with Red Hat Support: > https://access.redhat.com/kb/docs/DOC-53348 > https://access.redhat.com/kb/docs/DOC-58412 I'll take a look at it. -- Jan -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster