Re: [PATCH 4/6] votequorum: Block sync until qdevice poll

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Honza,

Sorry, my qdisk have not implemented yet, so I currently could not
provide you any commit message. Attachment is the top level pseudo
code of my qdisk, I think it can almost express my use case and how to
adapt to your new change in the votequorum Please take a look. Thanks!

B.R.,
Jason

On Fri, Aug 8, 2014 at 2:32 PM, jason <huzhijiang@xxxxxxxxx> wrote:
> Hi Honza,
>
> On Aug 7, 2014 2:55 PM, "Jan Friesse" <jfriesse@xxxxxxxxxx> wrote:
>>
>> Jason
>>
>>
>>
>>> Hi Honza,
>>>
>>> I think I am clear with this patch after your explanation, thanks!
>>>
>>> But if there is a very simple qdevice implementation which voting is
>>> not based on changed membership,but based on the current voting
>>> result, can it simply ignore the feature of this patch by setting
>>
>>
>> I'm not exactly sure what is such qdevice then good for...
>>
>>
>>> timeout to 0 to keep on the old behavior?  I'm afraid it can't any
>>
>>
>> That's why I wrote *almost* same behavior
>>
>>
>>> more. Because if set timeout to 0, qdevice_timer_fn() will be called
>>> immediately, which result in us->flags being erased, thus, votequorum
>>> lost the current voting result.  Right?
>>
>>
>> Right.
>>
>> But IF you want to use old result for some reason, you can keep default
>> config as it is and in your qdevice just call poll with new ring id and old
>> result right after receiving votequorum change. Timeout there should be
>> almost minimal (from my test with testvotequorum2, it's really few
>> milliseconds).
>
> I agree with you. This should be the way that my use case adaptes to this
> patch to omit the waiting in sync process. So for my case, thers is no need
> for you to make patch to reverts new functionality by setting sync_timeout
> to 0.
>
>>
>> Anyway. I can make patch which reverts new functionality completely if
>> sync_timeout is set to 0 (that's easy). But can you please explain me a
>> little more your use case (at least to have something for commit message)?
>
> My use case is a qdevice running in master_win mode. Normally, all qdevices
> always poll to votequorum in a constant time cycle(for example, 1sec), but
> only one master do cast vote. So after network partition or node down
> occured, votequorum can immediately find out the primary component by seeing
> which component includes the master. So now I mainly wonder what and how a
> qdevice can benifit from deciding to vote or not vote based on changed
> membership with the cost of letting sync process to wait the decision?
> Althrough it looks my use case is not a good qdevice implemented, but at
> least, it can totally omit the waiting in sync process.
>
>
>
>
>>
>> Honza
>>
>>
>>>
>>> On Wed, Aug 6, 2014 at 5:58 PM, Jan Friesse <jfriesse@xxxxxxxxxx> wrote:
>>>>
>>>> Jason,
>>>>
>>>>> Hi Honza,
>>>>> Could you please explain this more detaily ? Why should corosync block
>>>>> until it got poll from qdevice(or timeout)? Why not simply using the
>>>>> information provided by the latest poll? For example, in master win
>>>>> mode,
>>>>
>>>>
>>>> Because qdevice may decide to not vote based on changed membership. I
>>>> will give you example:
>>>>
>>>> - Qdevice votes
>>>> - membership change
>>>> - sync
>>>> - sync completed
>>>> - Qdevice decides to not vote
>>>>
>>>> Now, between membership change and qdevice decision may be quite a long
>>>> time when votequorum uses incorrect information. Patch simply prevents
>>>> this race condition.
>>>>
>>>>> infomation about who is master is already on hand when network split
>>>>> happen. So it can be used immediately to choose the primary component
>>>>> without any delay.  But the blocking method in this patch will
>>>>> introduce
>>>>
>>>>
>>>> It has not too much to do with master...
>>>>
>>>>> delay on deciding primay component, which may cause upper level
>>>>> application
>>>>> stop providing service in a longer time, doesn't it?
>>>>
>>>>
>>>> It has again nothing to do with primary component. But yes, patch can
>>>> cause upper level application to stop providing service for quite a long
>>>> time (max 30 sec by default). That's why sync timeout is configurable
>>>> (you can set 0 and you have almost old behavior) and why qdevice must be
>>>> developed with this problem in mind. Meaning, qdevice must be as fast as
>>>> possible.
>>>>
>>>> Did it explained your questions?
>>>>
>>>> Regards,
>>>>    Honza
>>>>
>>>>> On Aug 5, 2014 6:10 PM, "Jan Friesse" <jfriesse@xxxxxxxxxx> wrote:
>>>>>
>>>>>> If qdevice is registered a alive, corosync waits in sync phase until
>>>>>> timeout expires or qdevice votes with correct nodeid parameter.
>>>>>>
>>>>>> This gives qdevice time to decide to vote or not undisturbed and
>>>>>> without
>>>>>> time hazard.
>>>>>>
>>>>>> Signed-off-by: Jan Friesse <jfriesse@xxxxxxxxxx>
>>>>>> ---
>>>>>>   exec/votequorum.c |   48
>>>>>> +++++++++++++++++++++++++++++++++++++++++++-----
>>>>>>   1 files changed, 43 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/exec/votequorum.c b/exec/votequorum.c
>>>>>> index dd5bea7..7781077 100644
>>>>>> --- a/exec/votequorum.c
>>>>>> +++ b/exec/votequorum.c
>>>>>> @@ -234,6 +234,8 @@ static corosync_timer_handle_t qdevice_timer;
>>>>>>   static int qdevice_timer_set = 0;
>>>>>>   static corosync_timer_handle_t last_man_standing_timer;
>>>>>>   static int last_man_standing_timer_set = 0;
>>>>>> +static int sync_nodeinfo_sent = 0;
>>>>>> +static int sync_wait_for_poll_or_timeout = 0;
>>>>>>
>>>>>>   /*
>>>>>>    * Service Interfaces required by service_message_handler struct
>>>>>> @@ -310,6 +312,8 @@ static int quorum_lib_init_fn (void *conn);
>>>>>>
>>>>>>   static int quorum_lib_exit_fn (void *conn);
>>>>>>
>>>>>> +static void qdevice_timer_fn(void *arg);
>>>>>> +
>>>>>>   static void message_handler_req_lib_votequorum_getinfo (void *conn,
>>>>>>                                                          const void
>>>>>> *message);
>>>>>>
>>>>>> @@ -2182,6 +2186,8 @@ static void votequorum_sync_init (
>>>>>>          ENTER();
>>>>>>
>>>>>>          sync_in_progress = 1;
>>>>>> +       sync_nodeinfo_sent = 0;
>>>>>> +       sync_wait_for_poll_or_timeout = 0;
>>>>>>
>>>>>>          if (member_list_entries > 1) {
>>>>>>                  us->flags &= ~NODE_FLAGS_FIRST;
>>>>>> @@ -2231,17 +2237,46 @@ static void votequorum_sync_init (
>>>>>>          quorum_members_entries = member_list_entries;
>>>>>>          memcpy(&quorum_ringid, ring_id, sizeof(*ring_id));
>>>>>>
>>>>>> +       if (us->flags & NODE_FLAGS_QDEVICE_REGISTERED && us->flags &
>>>>>> NODE_FLAGS_QDEVICE_ALIVE) {
>>>>>> +               /*
>>>>>> +                * Reset poll timer. Sync waiting is interrupted on
>>>>>> valid
>>>>>> qdevice poll or after timeout
>>>>>> +                */
>>>>>> +               if (qdevice_timer_set) {
>>>>>> +                       corosync_api->timer_delete(qdevice_timer);
>>>>>> +               }
>>>>>> +               corosync_api->timer_add_duration((unsigned long
>>>>>> long)qdevice_timeout*1000000, qdevice,
>>>>>> +                                                qdevice_timer_fn,
>>>>>> &qdevice_timer);
>>>>>> +               qdevice_timer_set = 1;
>>>>>> +               sync_wait_for_poll_or_timeout = 1;
>>>>>> +
>>>>>> +               log_printf(LOGSYS_LEVEL_INFO, "waiting for quorum
>>>>>> device
>>>>>> %s poll (but maximum for %u ms)",
>>>>>> +                       qdevice_name, qdevice_timeout);
>>>>>> +       }
>>>>>> +
>>>>>>          LEAVE();
>>>>>>   }
>>>>>>
>>>>>>   static int votequorum_sync_process (void)
>>>>>>   {
>>>>>> -       votequorum_exec_send_nodeinfo(us->node_id);
>>>>>> -       votequorum_exec_send_nodeinfo(VOTEQUORUM_QDEVICE_NODEID);
>>>>>> -       if (strlen(qdevice_name)) {
>>>>>> -
>>>>>>
>>>>>> votequorum_exec_send_qdevice_reg(VOTEQUORUM_QDEVICE_OPERATION_REGISTER,
>>>>>> -                                                qdevice_name);
>>>>>> +
>>>>>> +       if (!sync_nodeinfo_sent) {
>>>>>> +               votequorum_exec_send_nodeinfo(us->node_id);
>>>>>> +
>>>>>> votequorum_exec_send_nodeinfo(VOTEQUORUM_QDEVICE_NODEID);
>>>>>> +               if (strlen(qdevice_name)) {
>>>>>> +
>>>>>>
>>>>>> votequorum_exec_send_qdevice_reg(VOTEQUORUM_QDEVICE_OPERATION_REGISTER,
>>>>>> +
>>>>>> qdevice_name);
>>>>>> +               }
>>>>>> +               sync_nodeinfo_sent = 1;
>>>>>>          }
>>>>>> +
>>>>>> +       if (us->flags & NODE_FLAGS_QDEVICE_REGISTERED &&
>>>>>> sync_wait_for_poll_or_timeout) {
>>>>>> +               /*
>>>>>> +                * Waiting for qdevice to poll with new ringid or
>>>>>> timeout
>>>>>> +                */
>>>>>> +
>>>>>> +               return (-1);
>>>>>> +       }
>>>>>> +
>>>>>>          return 0;
>>>>>>   }
>>>>>>
>>>>>> @@ -2336,6 +2371,7 @@ static void qdevice_timer_fn(void *arg)
>>>>>>          votequorum_exec_send_nodeinfo(us->node_id);
>>>>>>
>>>>>>          qdevice_timer_set = 0;
>>>>>> +       sync_wait_for_poll_or_timeout = 0;
>>>>>>
>>>>>>          LEAVE();
>>>>>>   }
>>>>>> @@ -2675,6 +2711,7 @@ static void
>>>>>> message_handler_req_lib_votequorum_qdevice_unregister (void *conn,
>>>>>>                  if (qdevice_timer_set) {
>>>>>>                          corosync_api->timer_delete(qdevice_timer);
>>>>>>                          qdevice_timer_set = 0;
>>>>>> +                       sync_wait_for_poll_or_timeout = 0;
>>>>>>                  }
>>>>>>                  us->flags &= ~NODE_FLAGS_QDEVICE_REGISTERED;
>>>>>>                  us->flags &= ~NODE_FLAGS_QDEVICE_ALIVE;
>>>>>> @@ -2777,6 +2814,7 @@ static void
>>>>>> message_handler_req_lib_votequorum_qdevice_poll (void *conn,
>>>>>>                  corosync_api->timer_add_duration((unsigned long
>>>>>> long)qdevice_timeout*1000000, qdevice,
>>>>>>                                                   qdevice_timer_fn,
>>>>>> &qdevice_timer);
>>>>>>                  qdevice_timer_set = 1;
>>>>>> +               sync_wait_for_poll_or_timeout = 0;
>>>>>>          } else {
>>>>>>                  error = CS_ERR_NOT_EXIST;
>>>>>>          }
>>>>>> --
>>>>>> 1.7.1
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list
>>>>>> discuss@xxxxxxxxxxxx
>>>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>



-- 
Yours,
Jason
#include <corosync/corotypes.h>
#include <corosync/votequorum.h>

#define QDEVICE_NAME "simple_qdisk"

votequorum_handle_t g_handle;
votequorum_ring_id_t g_current_ring_id;
int g_last_cast_vote = 0;

static void unregister_qdisk(void)
{
	votequorum_qdevice_unregister(g_handle, QDEVICE_NAME);
	votequorum_finalize(g_handle);
}

static void poll_votequorum(int cast_vote)
{
	votequorum_qdevice_poll(g_handle, QDEVICE_NAME, cast_vote, g_current_ring_id);
}

static void votequorum_notification_fn(
	votequorum_handle_t handle,
	uint64_t context,
	uint32_t quorate,
	votequorum_ring_id_t ring_id,
	uint32_t node_list_entries,
	votequorum_node_t node_list[]
	)
{
	g_current_ring_id = ring_id;
	poll_votequorum(g_last_cast_vote);
}

static int register_qdisk(void)
{
	cs_error_t result;
	votequorum_handle_t handle;
	votequorum_callbacks_t callbacks;
	int master_win = 1;

	callbacks.votequorum_notify_fn = votequorum_notification_fn;
	callbacks.votequorum_expectedvotes_notify_fn = NULL;

	result = votequorum_initialize(&handle, &callbacks);
	if (result != CS_OK)
		return -1;

	g_handle = handle;

	result = votequorum_qdevice_register(g_handle, QDEVICE_NAME);
	if (result != CS_OK) {
		votequorum_finalize(g_handle);
		return -1;
	}

	result = votequorum_qdevice_master_wins(g_handle, QDEVICE_NAME, master_win);
	if (result != CS_OK) {
		votequorum_qdevice_unregister(g_handle, QDEVICE_NAME);
		votequorum_finalize(g_handle);
		return -1;
	}

	return 0;
}

static void qdisk_loop(void)
{	
	while (1) {
		read_all_nodes_status_from_qdisk();

		if (!master_exists())
			bid_for_qdisk_master();
		else if (this_node_is_master()) {
			g_last_cast_vote = 1;
			poll_votequorum(g_last_cast_vote);
		} else if (this_node_is_alive()) {
			g_last_cast_vote = 0;
			poll_votequorum(g_last_cast_vote);
		}
		else
			break; // error occured

		write_this_node_status_to_qdisk();

		sleep(1); // poll votequorum every one second
	}
}

int main(int argc, char *argv[])
{
	int ret;
	
	ret = register_qdisk();
	if (ret != 0)
		return ret;

	qdisk_loop();
	unregister_qdisk();
	return 0;
}

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux