On 11/08/14 09:36, Jan Friesse wrote:
Christine Caulfield napsal(a):
On 11/08/14 09:17, Jan Friesse wrote:
Christine Caulfield napsal(a):
On 11/08/14 08:59, Jan Friesse wrote:
Chrissie,
On 11/08/14 07:29, Jan Friesse wrote:
Chrissie,
patch looks generally good, but is there a reason to add new library
call instead of tracking "quorum.wait_for_all" and if set to 0,
execute
code very similar to
message_handler_req_lib_votequorum_cancel_wait_for_all?
Yes. The point is not to clear wait_for_all itself, that's a
configuration option and we are not changing it - just the runtime
wait
state. The config option needs to remain enabled for the next time
nodes
Yes. But user can call corosync-cmapctl to change this variable. We
don't need to (or want to) to change it via reload. Very similar thing
is happening with expected votes. Take a look to
ed63c812afc15fc68ebd3363845a63f5c945623e (and this was actually
inspiration for what I'm suggesting). wait_for_all is totally same.
Allow natural selection. Dynamic change of "config" (but not stored to
config file).
No, it's not changing the config - even dynamically. It's changing a
state inside corosync, not even a dynamic configuration parameter.
Sure
wait_for_all_status is NOT the same thing as quorum.wait_for_all - not
even slightly. wait_for_all needs to remain set after this command
(whatever it turns out to be) for the next time a node goes down, we do
not want to have to wait for a reload for that to happen.
It will. cmap is NOT stored back into config, so wait_for_all WILL
remain set after this command for the next time a node down (are you
talking about local node, right?). No reload needs to happen.
Indeed, but it's still stored in cmap - and I don't want wait_for_all to
change in cmap. If that was what I wanted then that's what I would have
done.
wait_for_all is not what I'm changing here. It's the internal status
that says we are currently waiting - not that we should wait in future.
Oh. So you are talking about config integrity (current cmap reflect what
is in the config) and not about "technical" problem. It makes sense then.
But I think changing runtime.votequorum.wait_for_all_status give big
problem with recursion (maybe quite hard to solve, especially on
non-local node (node which initiated change)). If so, you can then try
to use similar method as triggering blackbox creation (set something
like runtime.votequorum.cancel_wait_for_all)
Which is why I created an API call ;-)
Chrissie
Honza
Chrissie
If this is to be done using cmapctl (and I'm happy for that to be the
case) then altering runtime.votequorum.wait_for_all_status is the thing
to do.
You will then get pretty ugly recursion (you are calling
update_wait_for_all_status).
That's why I believe it's just much cleaner to track wait_for_all.
Regards,
Honza
CHrisse
Regards,
Honza
are rebooted. This call is meant to be a temporary fix to a
particular
node-outage, not a reconfiguration of the cluster.
If there was a key to watch for it would be
runtime.votequorum.wait_for_all_status - I'll investigate the
practicality of doing that maybe. At the time I was wary of
watching/changing runtime.* keys from userspace.
But if we decide to go with library call, there must be few things
fixed:
- version can be 7.1.0. We are adding call, not changing existing
one
(so it's backwards compatible)
- We have to have support in cfgtool/quorumtool/... Keep in mind,
that
main user (pcs) is not calling corosync API directly, but they are
using
CLI tools.
Ugh, I didn't realise that. Thanks
Chrissie
- There should be check if wait_for_all is really activated.
All these things would be solved by tracking "quorum_wait_for_all"
for
free.
Regards,
Honza
Christine Caulfield napsal(a):
It's possible in a two_node cluster (and others but it's more
likely
with just two) that a node could be booted up after downtime or
failure
and the other node is not available for some reason. In this
case it
would not be allowed to proceed because wait_for_all is enforced.
This patch provides an API call to clear this flag in the desperate
situation where that becomes necessary. It should only be used with
extreme caution and will be wrapped up in pcs which should also
check
that fencing has been run.
Signed-Off-By: Christine Caulfield <ccaulfie@xxxxxxxxxx>
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss