Did you happen to have pacemaker or cman running at the time? On Mon, Nov 26, 2012 at 2:07 PM, jason <huzhijiang@xxxxxxxxx> wrote: > Update. > According to the AMF log about a timeout, I can confirm that the node which > had this issue could not receive mcast message even sent by itself at that > time. But I do not understand why it can receive JOIN message which result > in pause detection. > > 在 2012-11-25 下午9:39,"jason" <huzhijiang@xxxxxxxxx>写道: > >> Hi All, >> I currently encountered a publem with corosync-1.4.4 that kill -TERM does >> not stop corosync daemon. What I can confirm are: >> 1) The thread of corosync_exit_thread_handler() is done and disappeared >> (confirmed with gdb info threads). So the hooks into sched_work() which >> gets fired on token_send may not got chance to run(no token to send?) >> 2) I do not have firewall running when this ocurred. >> 3) No consensus timeout log before this publem happend. >> 4) I run gdb to attach to corosync, wasted some seconds, and when I >> continue to run it, I saw pause detection timer triggered(by check log),and >> after about 20 seconds, through the log I see both new confchg and service >> unload hanppend simultaneously and finally corosync exited normally. I >> think it is the new token created by the new ring to make corosync exits >> finally,but I can not tell if the creation of new ring is influenced by my >> running of gdb or not. >> >> This issue has not been reproduced but I am tring to. Could you help me to >> take look into this issue please? >> >> Many thanks! >> > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss