On Thu, 2006-09-07 at 16:07 -0700, Rick Rodgers wrote: > I am using an older version of clumanger (about 2 yrs old) and I > notice > that when the active node goes down the back will actually issue > stonith commands twice. They are about 60 seconds apart. Does this > happen to anyone else?? It's "normal" if you're using the disk tiebreaker. That is, it's been around for so long that people are used to it ;) Basically, both membership transitions and quorum disk transitions are causing full recovery (including STONITH). However, only one should cause a STONITH event -- the one that happens last. There is a switch which should fix it in 1.2.34, but it has to be enabled manually ('cludb -p cluquorumd%disk_quorum 1'). -- Lon
Index: ChangeLog =================================================================== RCS file: /cvs/devel/clumanager/ChangeLog,v retrieving revision 1.139 diff -u -r1.139 ChangeLog --- ChangeLog 20 Jan 2006 16:53:51 -0000 1.139 +++ ChangeLog 30 Jan 2006 20:54:26 -0000 @@ -1,3 +1,10 @@ +2006-01-30 Lon Hohberger <lhh at redhat.com> + * include/quorumparams.h: Add parameter for cluquorumd to + make it not STONITH if disk is up when disk tiebreaker is + in use. + * src/daemons/cluquorumd.c: Allow use of cluquorumd%disk_quorum + * doc/man/cludb.8: Document toggle for disk quorum. + 2006-01-20 Lon Hohberger <lhh at redhat.com> 1.2.31 * src/daemons/clumembd.c: Send last breath during shutdown to enable faster membership transitions when a node leaves cleanly Index: clumanager.spec.in =================================================================== RCS file: /cvs/devel/clumanager/clumanager.spec.in,v retrieving revision 1.91 diff -u -r1.91 clumanager.spec.in --- clumanager.spec.in 20 Jan 2006 16:53:51 -0000 1.91 +++ clumanager.spec.in 30 Jan 2006 20:54:26 -0000 @@ -98,6 +98,10 @@ %changelog +* Mon Jan 30 2006 Lon Hohberger <lhh at redhat.com> +- Add parameter for toggling use of disk for quorum in disk +tiebreaker cases. + * Fri Jan 20 2006 Lon Hohberger <lhh at redhat.com> 1.2.31 - Send last breath during clean shutdown to speed up membership transition Index: doc/man/cludb.8 =================================================================== RCS file: /cvs/devel/clumanager/doc/man/cludb.8,v retrieving revision 1.11 diff -u -r1.11 cludb.8 --- doc/man/cludb.8 21 Nov 2005 21:31:08 -0000 1.11 +++ doc/man/cludb.8 30 Jan 2006 20:54:26 -0000 @@ -145,6 +145,14 @@ daemon. By default, the quorum daemon does not run in real-time priority. You can enable this if you are having problems on heavily loaded systems. This value should be less than or equal to clumembd%rtp. +.IP "cluquorumd%disk_quorum" +Disk quorum preference (yes, no; default=no). When using the disk +tiebreaker in 2-node clusters, this parameter controls whether or not +the disk-based heartbeating mechanism is considered an official backup +for quorum. If set to 'yes', Cluster Manager will not attempt to +activate STONITH or fail over services if only the network connection is +lost. Note that services may not be started/stopped/etc. manually, as +network communications are not available. .IP "cluquorumd%allow_soft" Soft quorum preference (yes, no; default=no). Allows formation of a new cluster quorum utilizing an IP tie-breaker when half of the cluster Index: include/quorumparams.h =================================================================== RCS file: /cvs/devel/clumanager/include/quorumparams.h,v retrieving revision 1.6 diff -u -r1.6 quorumparams.h --- include/quorumparams.h 2 Sep 2004 15:16:35 -0000 1.6 +++ include/quorumparams.h 30 Jan 2006 20:54:26 -0000 @@ -71,6 +71,7 @@ #define CFG_QUORUM_POWER_CHECK_INTERVAL "cluquorumd%powercheckinterval" #define CFG_QUORUM_LOGLEVEL "cluquorumd%loglevel" #define CFG_QUORUM_RTP "cluquorumd%rtp" +#define CFG_QUORUM_DISK_QUORUM "cluquorumd%disk_quorum" #define CFG_QUORUM_ALLOW_SOFT "cluquorumd%allow_soft" #define CFG_QUORUM_IGNORE_GULM "cluquorumd%ignore_gulm" Index: src/daemons/cluquorumd.c =================================================================== RCS file: /cvs/devel/clumanager/src/daemons/cluquorumd.c,v retrieving revision 1.56 diff -u -r1.56 cluquorumd.c --- src/daemons/cluquorumd.c 13 Dec 2005 18:44:11 -0000 1.56 +++ src/daemons/cluquorumd.c 30 Jan 2006 20:54:26 -0000 @@ -76,6 +76,7 @@ static char *tb_ip = NULL; /* Tie breaker IP-address */ static time_t pswitch_check = 0; static int allow_soft_quorum = 0; +static int allow_disk_quorum = 0; static int ignore_gulm_absence = 0; /* @@ -170,6 +171,17 @@ } } + if (CFG_Get((char *) CFG_QUORUM_DISK_QUORUM, NULL, &p) == CFG_OK) { + if (p && (p[0] == 'y' || p[0] == '1' || p[0] == 'Y') && + (nodes == 2)) { + clulog(LOG_NOTICE, "Allowing disk quorum.\n"); + allow_disk_quorum = 1; + } else { + allow_disk_quorum = 0; + } + } + + if (CFG_Get((char *) CFG_QUORUM_RTP, NULL, &p) == CFG_OK) { if (p) { memset(¶m,0,sizeof(param)); @@ -235,11 +247,15 @@ distribute_switch_state(int member_controlled, int msg) { int x, fd; + struct timeval tv; + for (x=0; x<MAX_NODES; x++) { if (!memb_online(quorum_status.qv_mask, x)) continue; - fd = msg_open(PROCID_CLUQUORUMD, x); + tv.tv_sec = 3; + tv.tv_usec = 0; + fd = msg_open_timeout(PROCID_CLUQUORUMD, x, &tv); if (fd == -1) continue; msg_send_simple(fd, msg, my_node_id, member_controlled); @@ -344,6 +360,7 @@ notify_exiting(void) { int x, fd, errors = 0; + struct timeval tv; for (x=0; x < MAX_NODES; x++) { if (!memb_online(quorum_status.qv_mask, x)) @@ -352,7 +369,9 @@ if (x == my_node_id) continue; - fd = msg_open(PROCID_CLUQUORUMD, x); + tv.tv_sec = 3; + tv.tv_usec = 0; + fd = msg_open_timeout(PROCID_CLUQUORUMD, x, &tv); if (fd == -1) { errors++; continue; @@ -1137,10 +1156,16 @@ */ if (disk_in_use() && disk_other_status() && qv.qv_status) if (!memb_online(qv.qv_mask, disk_other_node())) { + if (allow_disk_quorum) { + memb_mark_down(qv.qv_stonith_mask, + disk_other_node()); + } + clulog(LOG_WARNING, "Membership reports #%d as" " down, but disk reports as up: " "State uncertain!\n", disk_other_node()); + memb_mark_up(qv.qv_panic_mask, disk_other_node()); }
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster