Re: [PATCH] CFG: Prevent CFG from orignating messages during SYNC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Jul 2, 2015 10:47 PM, "Jan Friesse" <jfriesse@xxxxxxxxxx> wrote:
>
> jason napsal(a):
>
>> Hi Honza,
>> Yes, my patch will break cfg API behavior which is not good. I've also
>> considered trying to resolve it in a more general way instead of simply cut
>> off all API during SYNC. The root of this issue is that SYNC mistakenly
>> believed that his last barrier message was the last message generated
>> during SYNC, or say, the right message to finish using
>> assembly_list_*_trans. So if we try to solve the problem dundamentally, we
>> may need  to find a way to find the right message. For example, SYNC
>> firstly tell Totem about it has done, then Totem waits until
>> assembly_list_*_trans back to initial state to finally call
>> sync_synchronization_complete(). But in very extreme cases, Totem may be
>> not able to wait such a message coming. Besides, this way causes SYNC and
>> Totem(which are allready coupled) to be more tightly coupled. So I
>> currently find it not easy to have a general solution. What do you think?
>
>
> Actually, in Flatiron is 5e6e1a000db8b066aff08b0649886e4ce07b843c . I didn't forwardported patch to Needle, simply because original bug didn't appeared in Needle, but it may be solution and it makes corosync use twice amount of memory.
>

I learned about the 5e6e1a000db8b066aff08b0649886e4ce07b843c patch in flatiron, and it seems it is not related with this issue? Because it can not guarantee a message which generated by totempg_context_array[1] to be received by assembly_list_*_trans.

I also patched my simulate_a_cfg_message_during_sync_which_breaks_defrag.patch into current flatiron's syncv2.c and set "compatibility: none"(just make the prove patch easy to apply) to see if it can pass. It did not passed.

>
>>
>> Besides, the upper layer such as pacemaker do not use CFG shutdown API. And
>> I personally always using "service corosync stop" and know little about CFG
>> shutdown API. If with my patch, someone calls corosync-cfgtool -H (locally)
>> and barred by SYNC and timed out, how about switch to "service corosync
>> stop" instead?
>
>
> Yep, true.
>
> Let's go ahead with your patch for master and we will see how much problems will appear.
>
> Regards,
>   Honza
>
>
>>   On Jul 1, 2015 11:06 PM, "Jan Friesse" <jfriesse@xxxxxxxxxx> wrote:
>>
>>> Jason,
>>> thanks for the patch. I was thinking a little deeper about this problem,
>>> and I think there are few problems for practical use point.
>>>
>>> Reload config is no brainier, no need to do it during sync, but kill node
>>> (local one), and try shutdown (again only for local host) may be wanted
>>> action even during sync. For example lets say there is problem with
>>> configuration and corosync is constantly in sync state, then there is no
>>> possibility how to turn it off via corosync-cfgtool.
>>>
>>> What is your opinion?
>>>
>>> Regards,
>>>    Honza
>>>
>>> Jason napsal(a):
>>>
>>>> From: Jason HU <huzhijiang@xxxxxxxxx>
>>>>
>>>> During SYNC, corosync-cfgtool -R/-H commands can pass through IPC then
>>>> send totem messages. This may corrupts
>>>> assembly_list_inuse/assembly_list_free if those messages are recedived
>>>> after SYNC is done.
>>>>
>>>> The solution is marking related CFG APIs as CS_LIB_FLOW_CONTROL_REQUIRED.
>>>>
>>>> Signed-off-by: Jason HU <huzhijiang@xxxxxxxxx>
>>>> ---
>>>>    exec/cfg.c | 8 ++++----
>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/exec/cfg.c b/exec/cfg.c
>>>> index b061bd4..b209331 100644
>>>> --- a/exec/cfg.c
>>>> +++ b/exec/cfg.c
>>>> @@ -177,15 +177,15 @@ static struct corosync_lib_handler cfg_lib_engine[]
>>>> =
>>>>          },
>>>>          { /* 2 */
>>>>                  .lib_handler_fn         =
>>>> message_handler_req_lib_cfg_killnode,
>>>> -               .flow_control           = CS_LIB_FLOW_CONTROL_NOT_REQUIRED
>>>> +               .flow_control           = CS_LIB_FLOW_CONTROL_REQUIRED
>>>>          },
>>>>          { /* 3 */
>>>>                  .lib_handler_fn         =
>>>> message_handler_req_lib_cfg_tryshutdown,
>>>> -               .flow_control           = CS_LIB_FLOW_CONTROL_NOT_REQUIRED
>>>> +               .flow_control           = CS_LIB_FLOW_CONTROL_REQUIRED
>>>>          },
>>>>          { /* 4 */
>>>>                  .lib_handler_fn         =
>>>> message_handler_req_lib_cfg_replytoshutdown,
>>>> -               .flow_control           = CS_LIB_FLOW_CONTROL_NOT_REQUIRED
>>>> +               .flow_control           = CS_LIB_FLOW_CONTROL_REQUIRED
>>>>          },
>>>>          { /* 5 */
>>>>                  .lib_handler_fn         =
>>>> message_handler_req_lib_cfg_get_node_addrs,
>>>> @@ -197,7 +197,7 @@ static struct corosync_lib_handler cfg_lib_engine[] =
>>>>          },
>>>>          { /* 7 */
>>>>                  .lib_handler_fn         =
>>>> message_handler_req_lib_cfg_reload_config,
>>>> -               .flow_control           = CS_LIB_FLOW_CONTROL_NOT_REQUIRED
>>>> +               .flow_control           = CS_LIB_FLOW_CONTROL_REQUIRED
>>>>          }
>>>>    };
>>>>
>>>>
>>>>
>>>
>>
>

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux