Are you using corosync with pacemaker when this happens? On Thu, Feb 21, 2013 at 2:27 PM, jason <huzhijiang@xxxxxxxxx> wrote: > Hi Steven, > > Do you have plan to port the new shutdown method in corosync-2.x back to > corosync-1.4.x? When using corosync-1.4.5, we encountered shutdown corosync > by using kill -3 failed several times. The latest one is because when > issuing kill -3, corosync_exit_sem had not been initialized by sem_init(), > so sem_post() in corosync_shutdown_request() failed to trigger > corosync_exit_thread_handler() to work. The resolution I think is simply to > call the sem_init() before we install signal handler. But as you say, if > corosync-2.x has more stronger mechanism for shutdown, why not port it back > to 1.4.x? > > On Feb 15, 2013 7:03 AM, "Steven Dake" <steven.dake@xxxxxxxxx> wrote: >> >> >> >> On Thu, Feb 14, 2013 at 1:23 PM, Brian J. Murrell >> <brian.murrell@xxxxxxxxxxxxxxx> wrote: >>> >>> On EL6, at least, trying to stop corosync (kill -TERM) seems to fail >>> quite frequently with corosync seemingly just not wanting to take heed >>> of the signal and exit. corosync-cfgtool -H doesn't seem to work either >>> and I just end up killing it with a SIGKILL. >>> >> Shutdown has been a never-ending source of frustration for corosync, now >> solved with the 2.x series :) >> >> The reason the TERM is not honored immediately is that Corosync wants to >> shut down in an orderly fashion on a TERM by quiescing services and shutting >> down cleanly with no pending messages. Sometimes this is not possible >> quickly because the network is flaky or blocked in some way (such as >> iptables). >> >> I had thought we had sorted all this out for 1.4 series though, so if you >> could provide more information on your corosync rpm version, that might be >> helpful. >> >> >>> >>> Is a SIGKILL really the only way to deal with this problem? Should this >>> need be codified into the initscript? i.e. try SIGTERM and then SIGKILL >>> after a timeout? What's a reasonable timeout for SIGTERM to have >>> worked? >>> >> >> sigterm should be honored by the corosync process rather then hacking >> around with a sigkill. >> >>> >>> Cheers, >>> b. >>> >>> >>> >>> >>> _______________________________________________ >>> discuss mailing list >>> discuss@xxxxxxxxxxxx >>> http://lists.corosync.org/mailman/listinfo/discuss >> >> >> >> _______________________________________________ >> discuss mailing list >> discuss@xxxxxxxxxxxx >> http://lists.corosync.org/mailman/listinfo/discuss >> > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss