jason napsal(a): > Hi Steven, > > Do you have plan to port the new shutdown method in corosync-2.x back to > corosync-1.4.x? When using corosync-1.4.5, we encountered shutdown It's almost impossible. Actually, shutdown sequence itself didn't changed much. What did is usage of threads (or correctly said, no threads in 2.x). > corosync by using kill -3 failed several times. The latest one is because > when issuing kill -3, corosync_exit_sem had not been initialized by > sem_init(), so sem_post() in corosync_shutdown_request() failed to trigger > corosync_exit_thread_handler() to work. The resolution I think is simply to > call the sem_init() before we install signal handler. But as you say, if Can you send patch? > corosync-2.x has more stronger mechanism for shutdown, why not port it back > to 1.4.x? > On Feb 15, 2013 7:03 AM, "Steven Dake" <steven.dake@xxxxxxxxx> wrote: > Honza >> >> >> On Thu, Feb 14, 2013 at 1:23 PM, Brian J. Murrell < >> brian.murrell@xxxxxxxxxxxxxxx> wrote: >> >>> On EL6, at least, trying to stop corosync (kill -TERM) seems to fail >>> quite frequently with corosync seemingly just not wanting to take heed >>> of the signal and exit. corosync-cfgtool -H doesn't seem to work either >>> and I just end up killing it with a SIGKILL. >>> >>> Shutdown has been a never-ending source of frustration for corosync, now >> solved with the 2.x series :) >> >> The reason the TERM is not honored immediately is that Corosync wants to >> shut down in an orderly fashion on a TERM by quiescing services and >> shutting down cleanly with no pending messages. Sometimes this is not >> possible quickly because the network is flaky or blocked in some way (such >> as iptables). >> >> I had thought we had sorted all this out for 1.4 series though, so if you >> could provide more information on your corosync rpm version, that might be >> helpful. >> >> >> >>> Is a SIGKILL really the only way to deal with this problem? Should this >>> need be codified into the initscript? i.e. try SIGTERM and then SIGKILL >>> after a timeout? What's a reasonable timeout for SIGTERM to have >>> worked? >>> >>> >> sigterm should be honored by the corosync process rather then hacking >> around with a sigkill. >> >> >>> Cheers, >>> b. >>> >>> >>> >>> >>> _______________________________________________ >>> discuss mailing list >>> discuss@xxxxxxxxxxxx >>> http://lists.corosync.org/mailman/listinfo/discuss >>> >> >> >> _______________________________________________ >> discuss mailing list >> discuss@xxxxxxxxxxxx >> http://lists.corosync.org/mailman/listinfo/discuss >> >> > > > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss