Hi Honza, Sorry, my qdisk have not implemented yet, so I currently could not provide you any commit message. Attachment is the top level pseudo code of my qdisk, I think it can almost express my use case and how to adapt to your new change in the votequorum Please take a look. Thanks! B.R., Jason On Fri, Aug 8, 2014 at 2:32 PM, jason <huzhijiang@xxxxxxxxx> wrote: > Hi Honza, > > On Aug 7, 2014 2:55 PM, "Jan Friesse" <jfriesse@xxxxxxxxxx> wrote: >> >> Jason >> >> >> >>> Hi Honza, >>> >>> I think I am clear with this patch after your explanation, thanks! >>> >>> But if there is a very simple qdevice implementation which voting is >>> not based on changed membership,but based on the current voting >>> result, can it simply ignore the feature of this patch by setting >> >> >> I'm not exactly sure what is such qdevice then good for... >> >> >>> timeout to 0 to keep on the old behavior? I'm afraid it can't any >> >> >> That's why I wrote *almost* same behavior >> >> >>> more. Because if set timeout to 0, qdevice_timer_fn() will be called >>> immediately, which result in us->flags being erased, thus, votequorum >>> lost the current voting result. Right? >> >> >> Right. >> >> But IF you want to use old result for some reason, you can keep default >> config as it is and in your qdevice just call poll with new ring id and old >> result right after receiving votequorum change. Timeout there should be >> almost minimal (from my test with testvotequorum2, it's really few >> milliseconds). > > I agree with you. This should be the way that my use case adaptes to this > patch to omit the waiting in sync process. So for my case, thers is no need > for you to make patch to reverts new functionality by setting sync_timeout > to 0. > >> >> Anyway. I can make patch which reverts new functionality completely if >> sync_timeout is set to 0 (that's easy). But can you please explain me a >> little more your use case (at least to have something for commit message)? > > My use case is a qdevice running in master_win mode. Normally, all qdevices > always poll to votequorum in a constant time cycle(for example, 1sec), but > only one master do cast vote. So after network partition or node down > occured, votequorum can immediately find out the primary component by seeing > which component includes the master. So now I mainly wonder what and how a > qdevice can benifit from deciding to vote or not vote based on changed > membership with the cost of letting sync process to wait the decision? > Althrough it looks my use case is not a good qdevice implemented, but at > least, it can totally omit the waiting in sync process. > > > > >> >> Honza >> >> >>> >>> On Wed, Aug 6, 2014 at 5:58 PM, Jan Friesse <jfriesse@xxxxxxxxxx> wrote: >>>> >>>> Jason, >>>> >>>>> Hi Honza, >>>>> Could you please explain this more detaily ? Why should corosync block >>>>> until it got poll from qdevice(or timeout)? Why not simply using the >>>>> information provided by the latest poll? For example, in master win >>>>> mode, >>>> >>>> >>>> Because qdevice may decide to not vote based on changed membership. I >>>> will give you example: >>>> >>>> - Qdevice votes >>>> - membership change >>>> - sync >>>> - sync completed >>>> - Qdevice decides to not vote >>>> >>>> Now, between membership change and qdevice decision may be quite a long >>>> time when votequorum uses incorrect information. Patch simply prevents >>>> this race condition. >>>> >>>>> infomation about who is master is already on hand when network split >>>>> happen. So it can be used immediately to choose the primary component >>>>> without any delay. But the blocking method in this patch will >>>>> introduce >>>> >>>> >>>> It has not too much to do with master... >>>> >>>>> delay on deciding primay component, which may cause upper level >>>>> application >>>>> stop providing service in a longer time, doesn't it? >>>> >>>> >>>> It has again nothing to do with primary component. But yes, patch can >>>> cause upper level application to stop providing service for quite a long >>>> time (max 30 sec by default). That's why sync timeout is configurable >>>> (you can set 0 and you have almost old behavior) and why qdevice must be >>>> developed with this problem in mind. Meaning, qdevice must be as fast as >>>> possible. >>>> >>>> Did it explained your questions? >>>> >>>> Regards, >>>> Honza >>>> >>>>> On Aug 5, 2014 6:10 PM, "Jan Friesse" <jfriesse@xxxxxxxxxx> wrote: >>>>> >>>>>> If qdevice is registered a alive, corosync waits in sync phase until >>>>>> timeout expires or qdevice votes with correct nodeid parameter. >>>>>> >>>>>> This gives qdevice time to decide to vote or not undisturbed and >>>>>> without >>>>>> time hazard. >>>>>> >>>>>> Signed-off-by: Jan Friesse <jfriesse@xxxxxxxxxx> >>>>>> --- >>>>>> exec/votequorum.c | 48 >>>>>> +++++++++++++++++++++++++++++++++++++++++++----- >>>>>> 1 files changed, 43 insertions(+), 5 deletions(-) >>>>>> >>>>>> diff --git a/exec/votequorum.c b/exec/votequorum.c >>>>>> index dd5bea7..7781077 100644 >>>>>> --- a/exec/votequorum.c >>>>>> +++ b/exec/votequorum.c >>>>>> @@ -234,6 +234,8 @@ static corosync_timer_handle_t qdevice_timer; >>>>>> static int qdevice_timer_set = 0; >>>>>> static corosync_timer_handle_t last_man_standing_timer; >>>>>> static int last_man_standing_timer_set = 0; >>>>>> +static int sync_nodeinfo_sent = 0; >>>>>> +static int sync_wait_for_poll_or_timeout = 0; >>>>>> >>>>>> /* >>>>>> * Service Interfaces required by service_message_handler struct >>>>>> @@ -310,6 +312,8 @@ static int quorum_lib_init_fn (void *conn); >>>>>> >>>>>> static int quorum_lib_exit_fn (void *conn); >>>>>> >>>>>> +static void qdevice_timer_fn(void *arg); >>>>>> + >>>>>> static void message_handler_req_lib_votequorum_getinfo (void *conn, >>>>>> const void >>>>>> *message); >>>>>> >>>>>> @@ -2182,6 +2186,8 @@ static void votequorum_sync_init ( >>>>>> ENTER(); >>>>>> >>>>>> sync_in_progress = 1; >>>>>> + sync_nodeinfo_sent = 0; >>>>>> + sync_wait_for_poll_or_timeout = 0; >>>>>> >>>>>> if (member_list_entries > 1) { >>>>>> us->flags &= ~NODE_FLAGS_FIRST; >>>>>> @@ -2231,17 +2237,46 @@ static void votequorum_sync_init ( >>>>>> quorum_members_entries = member_list_entries; >>>>>> memcpy(&quorum_ringid, ring_id, sizeof(*ring_id)); >>>>>> >>>>>> + if (us->flags & NODE_FLAGS_QDEVICE_REGISTERED && us->flags & >>>>>> NODE_FLAGS_QDEVICE_ALIVE) { >>>>>> + /* >>>>>> + * Reset poll timer. Sync waiting is interrupted on >>>>>> valid >>>>>> qdevice poll or after timeout >>>>>> + */ >>>>>> + if (qdevice_timer_set) { >>>>>> + corosync_api->timer_delete(qdevice_timer); >>>>>> + } >>>>>> + corosync_api->timer_add_duration((unsigned long >>>>>> long)qdevice_timeout*1000000, qdevice, >>>>>> + qdevice_timer_fn, >>>>>> &qdevice_timer); >>>>>> + qdevice_timer_set = 1; >>>>>> + sync_wait_for_poll_or_timeout = 1; >>>>>> + >>>>>> + log_printf(LOGSYS_LEVEL_INFO, "waiting for quorum >>>>>> device >>>>>> %s poll (but maximum for %u ms)", >>>>>> + qdevice_name, qdevice_timeout); >>>>>> + } >>>>>> + >>>>>> LEAVE(); >>>>>> } >>>>>> >>>>>> static int votequorum_sync_process (void) >>>>>> { >>>>>> - votequorum_exec_send_nodeinfo(us->node_id); >>>>>> - votequorum_exec_send_nodeinfo(VOTEQUORUM_QDEVICE_NODEID); >>>>>> - if (strlen(qdevice_name)) { >>>>>> - >>>>>> >>>>>> votequorum_exec_send_qdevice_reg(VOTEQUORUM_QDEVICE_OPERATION_REGISTER, >>>>>> - qdevice_name); >>>>>> + >>>>>> + if (!sync_nodeinfo_sent) { >>>>>> + votequorum_exec_send_nodeinfo(us->node_id); >>>>>> + >>>>>> votequorum_exec_send_nodeinfo(VOTEQUORUM_QDEVICE_NODEID); >>>>>> + if (strlen(qdevice_name)) { >>>>>> + >>>>>> >>>>>> votequorum_exec_send_qdevice_reg(VOTEQUORUM_QDEVICE_OPERATION_REGISTER, >>>>>> + >>>>>> qdevice_name); >>>>>> + } >>>>>> + sync_nodeinfo_sent = 1; >>>>>> } >>>>>> + >>>>>> + if (us->flags & NODE_FLAGS_QDEVICE_REGISTERED && >>>>>> sync_wait_for_poll_or_timeout) { >>>>>> + /* >>>>>> + * Waiting for qdevice to poll with new ringid or >>>>>> timeout >>>>>> + */ >>>>>> + >>>>>> + return (-1); >>>>>> + } >>>>>> + >>>>>> return 0; >>>>>> } >>>>>> >>>>>> @@ -2336,6 +2371,7 @@ static void qdevice_timer_fn(void *arg) >>>>>> votequorum_exec_send_nodeinfo(us->node_id); >>>>>> >>>>>> qdevice_timer_set = 0; >>>>>> + sync_wait_for_poll_or_timeout = 0; >>>>>> >>>>>> LEAVE(); >>>>>> } >>>>>> @@ -2675,6 +2711,7 @@ static void >>>>>> message_handler_req_lib_votequorum_qdevice_unregister (void *conn, >>>>>> if (qdevice_timer_set) { >>>>>> corosync_api->timer_delete(qdevice_timer); >>>>>> qdevice_timer_set = 0; >>>>>> + sync_wait_for_poll_or_timeout = 0; >>>>>> } >>>>>> us->flags &= ~NODE_FLAGS_QDEVICE_REGISTERED; >>>>>> us->flags &= ~NODE_FLAGS_QDEVICE_ALIVE; >>>>>> @@ -2777,6 +2814,7 @@ static void >>>>>> message_handler_req_lib_votequorum_qdevice_poll (void *conn, >>>>>> corosync_api->timer_add_duration((unsigned long >>>>>> long)qdevice_timeout*1000000, qdevice, >>>>>> qdevice_timer_fn, >>>>>> &qdevice_timer); >>>>>> qdevice_timer_set = 1; >>>>>> + sync_wait_for_poll_or_timeout = 0; >>>>>> } else { >>>>>> error = CS_ERR_NOT_EXIST; >>>>>> } >>>>>> -- >>>>>> 1.7.1 >>>>>> >>>>>> _______________________________________________ >>>>>> discuss mailing list >>>>>> discuss@xxxxxxxxxxxx >>>>>> http://lists.corosync.org/mailman/listinfo/discuss >>>>>> >>>>> >>>> >>> >>> >>> >> -- Yours, Jason
#include <corosync/corotypes.h> #include <corosync/votequorum.h> #define QDEVICE_NAME "simple_qdisk" votequorum_handle_t g_handle; votequorum_ring_id_t g_current_ring_id; int g_last_cast_vote = 0; static void unregister_qdisk(void) { votequorum_qdevice_unregister(g_handle, QDEVICE_NAME); votequorum_finalize(g_handle); } static void poll_votequorum(int cast_vote) { votequorum_qdevice_poll(g_handle, QDEVICE_NAME, cast_vote, g_current_ring_id); } static void votequorum_notification_fn( votequorum_handle_t handle, uint64_t context, uint32_t quorate, votequorum_ring_id_t ring_id, uint32_t node_list_entries, votequorum_node_t node_list[] ) { g_current_ring_id = ring_id; poll_votequorum(g_last_cast_vote); } static int register_qdisk(void) { cs_error_t result; votequorum_handle_t handle; votequorum_callbacks_t callbacks; int master_win = 1; callbacks.votequorum_notify_fn = votequorum_notification_fn; callbacks.votequorum_expectedvotes_notify_fn = NULL; result = votequorum_initialize(&handle, &callbacks); if (result != CS_OK) return -1; g_handle = handle; result = votequorum_qdevice_register(g_handle, QDEVICE_NAME); if (result != CS_OK) { votequorum_finalize(g_handle); return -1; } result = votequorum_qdevice_master_wins(g_handle, QDEVICE_NAME, master_win); if (result != CS_OK) { votequorum_qdevice_unregister(g_handle, QDEVICE_NAME); votequorum_finalize(g_handle); return -1; } return 0; } static void qdisk_loop(void) { while (1) { read_all_nodes_status_from_qdisk(); if (!master_exists()) bid_for_qdisk_master(); else if (this_node_is_master()) { g_last_cast_vote = 1; poll_votequorum(g_last_cast_vote); } else if (this_node_is_alive()) { g_last_cast_vote = 0; poll_votequorum(g_last_cast_vote); } else break; // error occured write_this_node_status_to_qdisk(); sleep(1); // poll votequorum every one second } } int main(int argc, char *argv[]) { int ret; ret = register_qdisk(); if (ret != 0) return ret; qdisk_loop(); unregister_qdisk(); return 0; }
_______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss