Re: Cluster stability with missing qdisk

Jan Huijsmans <Jan.Huijsmans@xxxxxxxxxxxxxx> · Tue, 14 Feb 2012 13:26:33 +0100

Hello,

I don't see what de multipath environment can help when the complete
quorum location is lost (only quorum disk, not the servers), but it's
included.

defaults {
        polling_interval       5
        failback               immediate
        no_path_retry          3
        rr_min_io              100
        path_checker           tur
        user_friendly_names    yes
        path_grouping_policy    group_by_prio
        prio_callout            "/sbin/mpath_prio_alua /dev/%n"}
}

The problem we have is with the availability of the quorum drive, It's needed
in a split brain situation, but with a dying qdisk, the cluster dies. (when both
nodes are functioning perfectly)

With regard,
Inter Access BV

ing. J.C.M. (Jan) Huijsmans
Designer / Technical Consultant UNIX & Linux
Infrastructure Professional Services UNIX

E-mail:  jan.huijsmans@xxxxxxxxxxxxxx

Tel:     035 688 8266
Mob.:    06 4938 8145

Hoofdkantoor:
Colosseum 9, 1213 NN Hilversum
Postbus 840, 1200 AV Hilversum
K.v.K. Hilversum 32032877
________________________________________
From: linux-cluster-bounces@xxxxxxxxxx [linux-cluster-bounces@xxxxxxxxxx] On Behalf Of emmanuel segura [emi2fast@xxxxxxxxx]
Sent: 10 February 2012 22:07
To: linux clustering
Subject: Re:  Cluster stability with missing qdisk

can you show me your multipath configuration?

2012/2/10 Jan Huijsmans <Jan.Huijsmans@xxxxxxxxxxxxxx<mailto:Jan.Huijsmans@xxxxxxxxxxxxxx>>
The timeout is now 150 sec. for the qdiskd (so it can have 2 failures on paths and have 30 seconds left to test the 3rd path in a dual fabric, dual path per fabric setup) en 300 sec for cman.

I would like to add 2 more, just to make sure it's not rebooting just because one location isn't reachable.
It's not the application that's having problems, just the cluster software that's causing the problem

With regards,
Inter Access BV

ing. J.C.M. (Jan) Huijsmans
Designer / Technical Consultant UNIX & Linux
Infrastructure Professional Services UNIX

E-mail:  jan.huijsmans@xxxxxxxxxxxxxx<mailto:jan.huijsmans@xxxxxxxxxxxxxx>

Tel:     035 688 8266<tel:035%20688%208266>
Mob.:    06 4938 8145<tel:06%204938%208145>

Hoofdkantoor:
Colosseum 9, 1213 NN Hilversum
Postbus 840, 1200 AV Hilversum
K.v.K. Hilversum 32032877
________________________________________
From: linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx> [linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx>] On Behalf Of emmanuel segura [emi2fast@xxxxxxxxx<mailto:emi2fast@xxxxxxxxx>]
Sent: 10 February 2012 18:00
To: linux clustering
Subject: Re:  Cluster stability with missing qdisk

I understand what you say and that's right

One solution can be play with qdisk cluster timeout

man qdisk for more info

2012/2/10 Jan Huijsmans <Jan.Huijsmans@xxxxxxxxxxxxxx<mailto:Jan.Huijsmans@xxxxxxxxxxxxxx><mailto:Jan.Huijsmans@xxxxxxxxxxxxxx<mailto:Jan.Huijsmans@xxxxxxxxxxxxxx>>>
We're using it on multipath, it's a failure to write to the device that's killing the cluster.
All other devices work, so the cluster can function as it should, would it be for the reboot by the cluster software.

I would like to prevent the cluster rebooting nodes just because the qdisk isn't responding (due to slow storage, failure on the quorum location such as powerloss and other, non application related, errors)

When both nodes are up and the application is able to run, there should be no reboot in my opinion.

With regards,
Inter Access BV

ing. J.C.M. (Jan) Huijsmans
Designer / Technical Consultant UNIX & Linux
Infrastructure Professional Services UNIX

E-mail:  jan.huijsmans@xxxxxxxxxxxxxx<mailto:jan.huijsmans@xxxxxxxxxxxxxx><mailto:jan.huijsmans@xxxxxxxxxxxxxx<mailto:jan.huijsmans@xxxxxxxxxxxxxx>>

Tel:     035 688 8266<tel:035%20688%208266><tel:035%20688%208266>
Mob.:    06 4938 8145<tel:06%204938%208145><tel:06%204938%208145>

Hoofdkantoor:
Colosseum 9, 1213 NN Hilversum
Postbus 840, 1200 AV Hilversum
K.v.K. Hilversum 32032877
________________________________________
From: linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx><mailto:linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx>> [linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx><mailto:linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx>>] On Behalf Of emmanuel segura [emi2fast@xxxxxxxxx<mailto:emi2fast@xxxxxxxxx><mailto:emi2fast@xxxxxxxxx<mailto:emi2fast@xxxxxxxxx>>]
Sent: 10 February 2012 15:16
To: linux clustering
Subject: Re:  Cluster stability with missing qdisk

Why you don't use qdisk on multipath, than can resolv your problem

2012/2/10 Jan Huijsmans <Jan.Huijsmans@xxxxxxxxxxxxxx<mailto:Jan.Huijsmans@xxxxxxxxxxxxxx><mailto:Jan.Huijsmans@xxxxxxxxxxxxxx<mailto:Jan.Huijsmans@xxxxxxxxxxxxxx>><mailto:Jan.Huijsmans@xxxxxxxxxxxxxx<mailto:Jan.Huijsmans@xxxxxxxxxxxxxx><mailto:Jan.Huijsmans@xxxxxxxxxxxxxx<mailto:Jan.Huijsmans@xxxxxxxxxxxxxx>>>>
Hello,

In the clusters we have we use a qdisk to determine which node had the quorum, in case of a split brain situation.

This is working great... until the qdisk itself is hit due to problems with the SAN. Is there a way to have a stable cluster,
with qdisks, where the absence of (1) qdisk won't kill the cluster all together. At this moment, with the setup with 1 qdisk,
the cluster is totally depending on the availability of the qdisk, while, IMHO, it should be expendable.

We have now a triangle setup, with 2 data centers and 1 extra 'quorum' location for the IBM SAN. In the SAN setup there
are 3 quorum devices, 1 in each data center and the 3rd on the quorum location. When 1 location fails (one of the data
centers or the quorum location) it is still up and running.

Is it possible to copy this setup and use 3 qdisks, so when 1 qdisk fails the cluster stays alive? I would set the vote
value of all components (systems and qdisks) to 1, so the cluster would keep running with 2 systems and 1 qdisk
or 1 system with 2 qdisks. (it'll be dead with only 3 qdisks, as the software will die with both systems ;) )

I've heard of setup's with 3 systems, where the 3rd was just for the quorum, so you this one can die, but on this
occasion it won't help us, as there are no systems on the 3rd location. (and it's not supported by Red Hat, when I'm
correctly informed)

With regards,
Inter Access BV

ing. J.C.M. (Jan) Huijsmans
Designer / Technical Consultant UNIX & Linux
Infrastructure Professional Services UNIX

E-mail:  jan.huijsmans@xxxxxxxxxxxxxx<mailto:jan.huijsmans@xxxxxxxxxxxxxx><mailto:jan.huijsmans@xxxxxxxxxxxxxx<mailto:jan.huijsmans@xxxxxxxxxxxxxx>><mailto:jan.huijsmans@xxxxxxxxxxxxxx<mailto:jan.huijsmans@xxxxxxxxxxxxxx><mailto:jan.huijsmans@xxxxxxxxxxxxxx<mailto:jan.huijsmans@xxxxxxxxxxxxxx>>>

Tel:     035 688 8266<tel:035%20688%208266><tel:035%20688%208266><tel:035%20688%208266>
Mob.:    06 4938 8145<tel:06%204938%208145><tel:06%204938%208145><tel:06%204938%208145>

Hoofdkantoor:
Colosseum 9, 1213 NN Hilversum
Postbus 840, 1200 AV Hilversum
K.v.K. Hilversum 32032877

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx><mailto:Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx>><mailto:Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx><mailto:Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx>>>
https://www.redhat.com/mailman/listinfo/linux-cluster

--
esta es mi vida e me la vivo hasta que dios quiera

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx><mailto:Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx>>
https://www.redhat.com/mailman/listinfo/linux-cluster

--
esta es mi vida e me la vivo hasta que dios quiera

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx>
https://www.redhat.com/mailman/listinfo/linux-cluster

--
esta es mi vida e me la vivo hasta que dios quiera

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster