Re: [PATCH] votequorum: add API to clear the wait_for_all status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/08/14 08:59, Jan Friesse wrote:
Chrissie,


On 11/08/14 07:29, Jan Friesse wrote:
Chrissie,
patch looks generally good, but is there a reason to add new library
call instead of tracking "quorum.wait_for_all" and if set to 0, execute
code very similar to
message_handler_req_lib_votequorum_cancel_wait_for_all?


Yes. The point is not to clear wait_for_all itself, that's a
configuration option and we are not changing it - just the runtime wait
state. The config option needs to remain enabled for the next time nodes

Yes. But user can call corosync-cmapctl to change this variable. We
don't need to (or want to) to change it via reload. Very similar thing
is happening with expected votes. Take a look to
ed63c812afc15fc68ebd3363845a63f5c945623e (and this was actually
inspiration for what I'm suggesting). wait_for_all is totally same.
Allow natural selection. Dynamic change of "config" (but not stored to
config file).


No, it's not changing the config - even dynamically. It's changing a state inside corosync, not even a dynamic configuration parameter. wait_for_all_status is NOT the same thing as quorum.wait_for_all - not even slightly. wait_for_all needs to remain set after this command (whatever it turns out to be) for the next time a node goes down, we do not want to have to wait for a reload for that to happen.

If this is to be done using cmapctl (and I'm happy for that to be the case) then altering runtime.votequorum.wait_for_all_status is the thing to do.

CHrisse


Regards,
   Honza

are rebooted. This call is meant to be a temporary fix to a particular
node-outage, not a reconfiguration of the cluster.

If there was a key to watch for it would be
runtime.votequorum.wait_for_all_status - I'll investigate the
practicality of doing that maybe. At the time I was wary of
watching/changing runtime.* keys from userspace.



But if we decide to go with library call, there must be few things
fixed:
- version can be 7.1.0. We are adding call, not changing existing one
(so it's backwards compatible)
- We have to have support in cfgtool/quorumtool/... Keep in mind, that
main user (pcs) is not calling corosync API directly, but they are using
CLI tools.


Ugh, I didn't realise that. Thanks


Chrissie

- There should be check if wait_for_all is really activated.

All these things would be solved by tracking "quorum_wait_for_all" for
free.

Regards,
   Honza

Christine Caulfield napsal(a):
It's possible in a two_node cluster (and others but it's more likely
with just two) that a node could be booted up after downtime or failure
and the other node is not available for some reason. In this case it
would not be allowed to proceed because wait_for_all is enforced.

This patch provides an API call to clear this flag in the desperate
situation where that becomes necessary. It should only be used with
extreme caution and will be wrapped up in pcs which should also check
that fencing has been run.

Signed-Off-By: Christine Caulfield <ccaulfie@xxxxxxxxxx>



_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss





_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss




[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux