Re: [PATCH] votequorum: add API to clear the wait_for_all status

Christine Caulfield <ccaulfie@xxxxxxxxxx> · Tue, 12 Aug 2014 15:36:15 +0100

On 12/08/14 14:41, Jan Friesse wrote:
Christine Caulfield napsal(a):
On 11/08/14 09:36, Jan Friesse wrote:
Christine Caulfield napsal(a):
On 11/08/14 09:17, Jan Friesse wrote:
Christine Caulfield napsal(a):
On 11/08/14 08:59, Jan Friesse wrote:
Chrissie,


On 11/08/14 07:29, Jan Friesse wrote:
Chrissie,
patch looks generally good, but is there a reason to add new
library
call instead of tracking "quorum.wait_for_all" and if set to 0,
execute
code very similar to
message_handler_req_lib_votequorum_cancel_wait_for_all?


Yes. The point is not to clear wait_for_all itself, that's a
configuration option and we are not changing it - just the runtime
wait
state. The config option needs to remain enabled for the next time
nodes

Yes. But user can call corosync-cmapctl to change this variable. We
don't need to (or want to) to change it via reload. Very similar
thing
is happening with expected votes. Take a look to
ed63c812afc15fc68ebd3363845a63f5c945623e (and this was actually
inspiration for what I'm suggesting). wait_for_all is totally same.
Allow natural selection. Dynamic change of "config" (but not
stored to
config file).


No, it's not changing the config - even dynamically. It's changing a
state inside corosync, not even a dynamic configuration parameter.

Sure

wait_for_all_status is NOT the same thing as quorum.wait_for_all -
not
even slightly. wait_for_all needs to remain set after this command
(whatever it turns out to be) for the next time a node goes down,
we do
not want to have to wait for a reload for that to happen.

It will. cmap is NOT stored back into config, so wait_for_all WILL
remain set after this command for the next time a node down (are you
talking about local node, right?). No reload needs to happen.

Indeed, but it's still stored in cmap - and I don't want
wait_for_all to
change in cmap. If that was what I wanted then that's what I would have
done.

wait_for_all is not what I'm changing here. It's the internal status
that says we are currently waiting - not that we should wait in future.


Oh. So you are talking about config integrity (current cmap reflect what
is in the config) and not about "technical" problem. It makes sense
then.

But I think changing runtime.votequorum.wait_for_all_status give big
problem with recursion (maybe quite hard to solve, especially on
non-local node (node which initiated change)). If so, you can then try
to use similar method as triggering blackbox creation (set something
like runtime.votequorum.cancel_wait_for_all)

Honza


OK try 3..

This one just uses a cmap key to trigger the cancel. Although it looks a
lot neater than the last one (mainly due to the lack of the new API
call) I'm not totally happy using a cmap variable as an edge-trigger.


Nice small patch.

Still, it's  a small patch and keeps the thing undocumented - which is
probably a plus ;-)


Uh. Would you make too much angry if I will ask you for (short)
documentation in cmap_keys man page?


I've added a man page entry:


Chrissie

diff --git a/exec/votequorum.c b/exec/votequorum.c
index 78e6b7b..6caccaf 100644
--- a/exec/votequorum.c
+++ b/exec/votequorum.c
@@ -150,6 +150,7 @@ static int votequorum_exec_send_quorum_notification(void *conn, uint64_t context
 
 #define VOTEQUORUM_RECONFIG_PARAM_EXPECTED_VOTES 1
 #define VOTEQUORUM_RECONFIG_PARAM_NODE_VOTES     2
+#define VOTEQUORUM_RECONFIG_PARAM_CANCEL_WFA     3
 
 static int votequorum_exec_send_reconfigure(uint8_t param, unsigned int nodeid, uint32_t value);
 
@@ -1487,6 +1488,7 @@ static void votequorum_refresh_config(
 {
 	int old_votes, old_expected_votes;
 	uint8_t reloading;
+	uint8_t cancel_wfa;
 
 	ENTER();
 
@@ -1498,6 +1500,15 @@ static void votequorum_refresh_config(
 		return ;
 	}
 
+	icmap_get_uint8("quorum.cancel_wait_for_all", &cancel_wfa);
+	if (strcmp(key_name, "quorum.cancel_wait_for_all") == 0 &&
+	    cancel_wfa >= 1) {
+	        icmap_set_uint8("quorum.cancel_wait_for_all", 0);
+		votequorum_exec_send_reconfigure(VOTEQUORUM_RECONFIG_PARAM_CANCEL_WFA,
+						 us->node_id, 0);
+		return;
+	}
+
 	old_votes = us->votes;
 	old_expected_votes = us->expected_votes;
 
@@ -2070,6 +2081,14 @@ static void message_handler_req_exec_votequorum_reconfigure (
 		recalculate_quorum(1, 0);  /* Allow decrease */
 		break;
 
+	case VOTEQUORUM_RECONFIG_PARAM_CANCEL_WFA:
+	        update_wait_for_all_status(0);
+		log_printf(LOGSYS_LEVEL_INFO, "wait_for_all_status reset by user on node %d.",
+			   req_exec_quorum_reconfigure->nodeid);
+		recalculate_quorum(0, 0);
+
+	        break;
+
 	}
 
 	LEAVE();
diff --git a/man/cmap_keys.8 b/man/cmap_keys.8
index aa40787..16b7d46 100644
--- a/man/cmap_keys.8
+++ b/man/cmap_keys.8
@@ -270,6 +270,11 @@ Informations about users/groups which are allowed to do IPC connection to
 corosync.
 
 .TP
+quorum.cancel_wait_for_all
+Tells votequorum to cancel waiting for all nodes at cluster startup. Can be used
+to unblock quorum if notes are known to be down. for pcs use only.
+
+.TP
 config.reload_in_progress
 This value will be set to 1 (or created) when corosync.conf reload is started,
 and set to 0 when the reload is completed. This allows interested subsystems
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss