On Fri, 31 Aug 2007, Kadlecsik Jozsi wrote:
> In spite of having 'fence_tool leave' and 'cman_tool leave remove' in the
> 'cman' init script, when stopping the five-member cluster, it looses
> quorum when only two machines run the cluster components:
>
> root@web1:~# cman_tool status
> Version: 6.0.1
> Config Version: 6
> Cluster Name: kfki
> Cluster Id: 1583
> Cluster Member: Yes
> Cluster Generation: 748
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 5
> Total votes: 2
> Quorum: 3 Activity blocked
> Active subsystems: 7
> Flags:
> Ports Bound: 0 11
> Node name: web1-gfs
> Node ID: 4
> Multicast addresses: 224.0.0.3
> Node addresses: 192.168.192.6
>
> root@web1:~# cman_tool nodes
> Node Sts Inc Joined Name
> 1 X 728 lxserv0-gfs
> 2 M 728 2007-08-31 09:19:09 lxserv1-gfs
> 3 X 728 web0-gfs
> 4 M 724 2007-08-31 09:18:48 web1-gfs
> 5 X 728 saturn-gfs
>
> '/etc/init.d/cman stop' was issued and executed successfully on the tree
> other nodes.
As I see it happens because the 'expected_votes' of the nodes are not
adjusted when nodes are removed. So even when decreasing of the quorum is
allowed, the highest expected vote value prevents decreasing the
value of the quorum.
I wrote the attached patch to adjust expected_votes when a node is removed
(and when it appears again). Please review it and apply if you agree with
it.
Best regards,
Jozsef
--
E-mail : kadlec@xxxxxxxxxxxxxxx, kadlec@xxxxxxxxxxxxxxxxx
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary
diff -urN --exclude=deb cluster-2.01.00.orig/cman/daemon/commands.c cluster-2.01.00/cman/daemon/commands.c
--- cluster-2.01.00.orig/cman/daemon/commands.c 2007-06-26 11:09:13.000000000 +0200
+++ cluster-2.01.00/cman/daemon/commands.c 2007-09-04 10:43:27.000000000 +0200
@@ -1867,7 +1867,7 @@
}
}
-void override_expected(int newexp)
+void reset_expected(int may_increase, int newexp)
{
struct list *nodelist;
struct cluster_node *node;
@@ -1875,13 +1875,12 @@
list_iterate(nodelist, &cluster_members_list) {
node = list_item(nodelist, struct cluster_node);
if (node->state == NODESTATE_MEMBER
- && node->expected_votes > newexp) {
+ && (node->expected_votes > newexp || may_increase)) {
node->expected_votes = newexp;
}
}
}
-
/* Add a node from CCS, note that it may already exist if user has simply updated the config file */
void add_ccs_node(char *nodename, int nodeid, int votes, int expected_votes)
{
@@ -1942,6 +1941,8 @@
node->incarnation = incarnation;
node->state = NODESTATE_MEMBER;
cluster_members++;
+ if ((node->leave_reason & 0xF) == CLUSTER_LEAVEFLAG_REMOVED)
+ reset_expected(1, us->expected_votes + node->votes);
recalculate_quorum(0);
}
}
@@ -1983,9 +1984,11 @@
node->state = NODESTATE_DEAD;
cluster_members--;
- if ((node->leave_reason & 0xF) == CLUSTER_LEAVEFLAG_REMOVED)
+ if ((node->leave_reason & 0xF) == CLUSTER_LEAVEFLAG_REMOVED) {
+ override_expected(us->expected_votes > node->votes ?
+ us->expected_votes - node->votes : 1);
recalculate_quorum(1);
- else
+ } else
recalculate_quorum(0);
break;
diff -urN --exclude=deb cluster-2.01.00.orig/cman/daemon/commands.h cluster-2.01.00/cman/daemon/commands.h
--- cluster-2.01.00.orig/cman/daemon/commands.h 2006-08-17 15:22:39.000000000 +0200
+++ cluster-2.01.00/cman/daemon/commands.h 2007-09-04 10:28:17.000000000 +0200
@@ -29,12 +29,12 @@
extern void add_ais_node(int nodeid, uint64_t incarnation, int total_members);
extern void del_ais_node(int nodeid);
extern void add_ccs_node(char *name, int nodeid, int votes, int expected_votes);
-extern void override_expected(int expected);
+extern void reset_expected(int may_increase, int expected);
extern void cman_send_confchg(unsigned int *member_list, int member_list_entries,
unsigned int *left_list, int left_list_entries,
unsigned int *joined_list, int joined_list_entries);
-
+#define override_expected(expected) reset_expected(0, expected)
/* Startup stuff called from cmanccs: */
extern int cman_set_nodename(char *name);
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster