Thanks for patch! Ack + I've pushed it. Regards, Honza jason napsal(a): > Hi Jan, > > Here is my patch against corosync-1.4.5. > > diff -ruNp corosync-1.4.5-orig/exec/main.c corosync-1.4.5/exec/main.c > --- corosync-1.4.5-orig/exec/main.c 2012-12-12 18:47:52.000000000 +0800 > +++ corosync-1.4.5/exec/main.c 2013-02-26 20:48:48.937500000 +0800 > @@ -1620,6 +1620,14 @@ int main (int argc, char **argv, char ** > log_printf (LOGSYS_LEVEL_NOTICE, "Corosync Cluster Engine > ('%s'): started and ready to provide service.\n", VERSION); > log_printf (LOGSYS_LEVEL_INFO, "Corosync built-in features:" > PACKAGE_FEATURES "\n"); > > + /* > + * Create exit sempahore. > + */ > + res = sem_init (&corosync_exit_sem, 0, 0); > + if (res != 0) { > + log_printf (LOGSYS_LEVEL_ERROR, "Corosync Executive > couldn't create exit sempahore.\n"); > + corosync_exit_error (AIS_DONE_FATAL_ERR); > + } > > (void)signal (SIGINT, sigintr_handler); > (void)signal (SIGUSR2, sigusr2_handler); > @@ -1803,14 +1811,8 @@ int main (int argc, char **argv, char ** > // TODO what is this hack for? usleep(totem_config.token_timeout * 2000); > > /* > - * Create semaphore and start "exit" thread > + * Start "exit" thread > */ > - res = sem_init (&corosync_exit_sem, 0, 0); > - if (res != 0) { > - log_printf (LOGSYS_LEVEL_ERROR, "Corosync Executive > couldn't create exit thread.\n"); > - corosync_exit_error (AIS_DONE_FATAL_ERR); > - } > - > res = pthread_create (&corosync_exit_thread, NULL, > corosync_exit_thread_handler, NULL); > if (res != 0) { > log_printf (LOGSYS_LEVEL_ERROR, "Corosync Executive > couldn't create exit thread.\n"); > > > On Mon, Feb 25, 2013 at 5:14 PM, Jan Friesse <jfriesse@xxxxxxxxxx> wrote: >> jason napsal(a): >>> Hi Steven, >>> >>> Do you have plan to port the new shutdown method in corosync-2.x back to >>> corosync-1.4.x? When using corosync-1.4.5, we encountered shutdown >> >> It's almost impossible. Actually, shutdown sequence itself didn't >> changed much. What did is usage of threads (or correctly said, no >> threads in 2.x). >> >>> corosync by using kill -3 failed several times. The latest one is because >>> when issuing kill -3, corosync_exit_sem had not been initialized by >>> sem_init(), so sem_post() in corosync_shutdown_request() failed to trigger >>> corosync_exit_thread_handler() to work. The resolution I think is simply to >>> call the sem_init() before we install signal handler. But as you say, if >> >> Can you send patch? >> >>> corosync-2.x has more stronger mechanism for shutdown, why not port it back >>> to 1.4.x? >>> On Feb 15, 2013 7:03 AM, "Steven Dake" <steven.dake@xxxxxxxxx> wrote: >>> >> >> Honza >> >>>> >>>> >>>> On Thu, Feb 14, 2013 at 1:23 PM, Brian J. Murrell < >>>> brian.murrell@xxxxxxxxxxxxxxx> wrote: >>>> >>>>> On EL6, at least, trying to stop corosync (kill -TERM) seems to fail >>>>> quite frequently with corosync seemingly just not wanting to take heed >>>>> of the signal and exit. corosync-cfgtool -H doesn't seem to work either >>>>> and I just end up killing it with a SIGKILL. >>>>> >>>>> Shutdown has been a never-ending source of frustration for corosync, now >>>> solved with the 2.x series :) >>>> >>>> The reason the TERM is not honored immediately is that Corosync wants to >>>> shut down in an orderly fashion on a TERM by quiescing services and >>>> shutting down cleanly with no pending messages. Sometimes this is not >>>> possible quickly because the network is flaky or blocked in some way (such >>>> as iptables). >>>> >>>> I had thought we had sorted all this out for 1.4 series though, so if you >>>> could provide more information on your corosync rpm version, that might be >>>> helpful. >>>> >>>> >>>> >>>>> Is a SIGKILL really the only way to deal with this problem? Should this >>>>> need be codified into the initscript? i.e. try SIGTERM and then SIGKILL >>>>> after a timeout? What's a reasonable timeout for SIGTERM to have >>>>> worked? >>>>> >>>>> >>>> sigterm should be honored by the corosync process rather then hacking >>>> around with a sigkill. >>>> >>>> >>>>> Cheers, >>>>> b. >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> discuss mailing list >>>>> discuss@xxxxxxxxxxxx >>>>> http://lists.corosync.org/mailman/listinfo/discuss >>>>> >>>> >>>> >>>> _______________________________________________ >>>> discuss mailing list >>>> discuss@xxxxxxxxxxxx >>>> http://lists.corosync.org/mailman/listinfo/discuss >>>> >>>> >>> >>> >>> >>> _______________________________________________ >>> discuss mailing list >>> discuss@xxxxxxxxxxxx >>> http://lists.corosync.org/mailman/listinfo/discuss >> > > > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss