Update.
According to the AMF log about a timeout, I can confirm that the node which had this issue could not receive mcast message even sent by itself at that time. But I do not understand why it can receive JOIN message which result in pause detection.
Hi All,
I currently encountered a publem with corosync-1.4.4 that kill -TERM does not stop corosync daemon. What I can confirm are:
1) The thread of corosync_exit_thread_handler() is done and disappeared (confirmed with gdb info threads). So the hooks into sched_work() which gets fired on token_send may not got chance to run(no token to send?)
2) I do not have firewall running when this ocurred.
3) No consensus timeout log before this publem happend.
4) I run gdb to attach to corosync, wasted some seconds, and when I continue to run it, I saw pause detection timer triggered(by check log),and after about 20 seconds, through the log I see both new confchg and service unload hanppend simultaneously and finally corosync exited normally. I think it is the new token created by the new ring to make corosync exits finally,but I can not tell if the creation of new ring is influenced by my running of gdb or not.This issue has not been reproduced but I am tring to. Could you help me to take look into this issue please?
Many thanks!
_______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss