Hi Honza,
Yes, my patch will break cfg API behavior which is not good. I've also considered trying to resolve it in a more general way instead of simply cut off all API during SYNC. The root of this issue is that SYNC mistakenly believed that his last barrier message was the last message generated during SYNC, or say, the right message to finish using assembly_list_*_trans. So if we try to solve the problem dundamentally, we may need to find a way to find the right message. For example, SYNC firstly tell Totem about it has done, then Totem waits until assembly_list_*_trans back to initial state to finally call sync_synchronization_complete(). But in very extreme cases, Totem may be not able to wait such a message coming. Besides, this way causes SYNC and Totem(which are allready coupled) to be more tightly coupled. So I currently find it not easy to have a general solution. What do you think?
Besides, the upper layer such as pacemaker do not use CFG shutdown API. And I personally always using "service corosync stop" and know little about CFG shutdown API. If with my patch, someone calls corosync-cfgtool -H (locally) and barred by SYNC and timed out, how about switch to "service corosync stop" instead?
Jason,
thanks for the patch. I was thinking a little deeper about this problem, and I think there are few problems for practical use point.
Reload config is no brainier, no need to do it during sync, but kill node (local one), and try shutdown (again only for local host) may be wanted action even during sync. For example lets say there is problem with configuration and corosync is constantly in sync state, then there is no possibility how to turn it off via corosync-cfgtool.
What is your opinion?
Regards,
Honza
Jason napsal(a):
From: Jason HU <huzhijiang@xxxxxxxxx>
During SYNC, corosync-cfgtool -R/-H commands can pass through IPC then send totem messages. This may corrupts assembly_list_inuse/assembly_list_free if those messages are recedived after SYNC is done.
The solution is marking related CFG APIs as CS_LIB_FLOW_CONTROL_REQUIRED.
Signed-off-by: Jason HU <huzhijiang@xxxxxxxxx>
---
exec/cfg.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/exec/cfg.c b/exec/cfg.c
index b061bd4..b209331 100644
--- a/exec/cfg.c
+++ b/exec/cfg.c
@@ -177,15 +177,15 @@ static struct corosync_lib_handler cfg_lib_engine[] =
},
{ /* 2 */
.lib_handler_fn = message_handler_req_lib_cfg_killnode,
- .flow_control = CS_LIB_FLOW_CONTROL_NOT_REQUIRED
+ .flow_control = CS_LIB_FLOW_CONTROL_REQUIRED
},
{ /* 3 */
.lib_handler_fn = message_handler_req_lib_cfg_tryshutdown,
- .flow_control = CS_LIB_FLOW_CONTROL_NOT_REQUIRED
+ .flow_control = CS_LIB_FLOW_CONTROL_REQUIRED
},
{ /* 4 */
.lib_handler_fn = message_handler_req_lib_cfg_replytoshutdown,
- .flow_control = CS_LIB_FLOW_CONTROL_NOT_REQUIRED
+ .flow_control = CS_LIB_FLOW_CONTROL_REQUIRED
},
{ /* 5 */
.lib_handler_fn = message_handler_req_lib_cfg_get_node_addrs,
@@ -197,7 +197,7 @@ static struct corosync_lib_handler cfg_lib_engine[] =
},
{ /* 7 */
.lib_handler_fn = message_handler_req_lib_cfg_reload_config,
- .flow_control = CS_LIB_FLOW_CONTROL_NOT_REQUIRED
+ .flow_control = CS_LIB_FLOW_CONTROL_REQUIRED
}
};
_______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss