jason napsal(a):
Hi Honza,
Yes, my patch will break cfg API behavior which is not good. I've also
considered trying to resolve it in a more general way instead of simply cut
off all API during SYNC. The root of this issue is that SYNC mistakenly
believed that his last barrier message was the last message generated
during SYNC, or say, the right message to finish using
assembly_list_*_trans. So if we try to solve the problem dundamentally, we
may need to find a way to find the right message. For example, SYNC
firstly tell Totem about it has done, then Totem waits until
assembly_list_*_trans back to initial state to finally call
sync_synchronization_complete(). But in very extreme cases, Totem may be
not able to wait such a message coming. Besides, this way causes SYNC and
Totem(which are allready coupled) to be more tightly coupled. So I
currently find it not easy to have a general solution. What do you think?
Actually, in Flatiron is 5e6e1a000db8b066aff08b0649886e4ce07b843c . I
didn't forwardported patch to Needle, simply because original bug didn't
appeared in Needle, but it may be solution and it makes corosync use
twice amount of memory.
Besides, the upper layer such as pacemaker do not use CFG shutdown API. And
I personally always using "service corosync stop" and know little about CFG
shutdown API. If with my patch, someone calls corosync-cfgtool -H (locally)
and barred by SYNC and timed out, how about switch to "service corosync
stop" instead?
Yep, true.
Let's go ahead with your patch for master and we will see how much
problems will appear.
Regards,
Honza
On Jul 1, 2015 11:06 PM, "Jan Friesse" <jfriesse@xxxxxxxxxx> wrote:
Jason,
thanks for the patch. I was thinking a little deeper about this problem,
and I think there are few problems for practical use point.
Reload config is no brainier, no need to do it during sync, but kill node
(local one), and try shutdown (again only for local host) may be wanted
action even during sync. For example lets say there is problem with
configuration and corosync is constantly in sync state, then there is no
possibility how to turn it off via corosync-cfgtool.
What is your opinion?
Regards,
Honza
Jason napsal(a):
From: Jason HU <huzhijiang@xxxxxxxxx>
During SYNC, corosync-cfgtool -R/-H commands can pass through IPC then
send totem messages. This may corrupts
assembly_list_inuse/assembly_list_free if those messages are recedived
after SYNC is done.
The solution is marking related CFG APIs as CS_LIB_FLOW_CONTROL_REQUIRED.
Signed-off-by: Jason HU <huzhijiang@xxxxxxxxx>
---
exec/cfg.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/exec/cfg.c b/exec/cfg.c
index b061bd4..b209331 100644
--- a/exec/cfg.c
+++ b/exec/cfg.c
@@ -177,15 +177,15 @@ static struct corosync_lib_handler cfg_lib_engine[]
=
},
{ /* 2 */
.lib_handler_fn =
message_handler_req_lib_cfg_killnode,
- .flow_control = CS_LIB_FLOW_CONTROL_NOT_REQUIRED
+ .flow_control = CS_LIB_FLOW_CONTROL_REQUIRED
},
{ /* 3 */
.lib_handler_fn =
message_handler_req_lib_cfg_tryshutdown,
- .flow_control = CS_LIB_FLOW_CONTROL_NOT_REQUIRED
+ .flow_control = CS_LIB_FLOW_CONTROL_REQUIRED
},
{ /* 4 */
.lib_handler_fn =
message_handler_req_lib_cfg_replytoshutdown,
- .flow_control = CS_LIB_FLOW_CONTROL_NOT_REQUIRED
+ .flow_control = CS_LIB_FLOW_CONTROL_REQUIRED
},
{ /* 5 */
.lib_handler_fn =
message_handler_req_lib_cfg_get_node_addrs,
@@ -197,7 +197,7 @@ static struct corosync_lib_handler cfg_lib_engine[] =
},
{ /* 7 */
.lib_handler_fn =
message_handler_req_lib_cfg_reload_config,
- .flow_control = CS_LIB_FLOW_CONTROL_NOT_REQUIRED
+ .flow_control = CS_LIB_FLOW_CONTROL_REQUIRED
}
};
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss