Re: shutdown seems to get hung up quite frequently

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrew,

No I just run openais-1.1.4 and corosync-1.4.4 and I start/stop corosync daemon frequently by using a shell script.

On Feb 21, 2013 11:57 AM, "Andrew Beekhof" <andrew@xxxxxxxxxxx> wrote:
Are you using corosync with pacemaker when this happens?

On Thu, Feb 21, 2013 at 2:27 PM, jason <huzhijiang@xxxxxxxxx> wrote:
> Hi Steven,
>
> Do you have plan to port the new shutdown method in corosync-2.x back to
> corosync-1.4.x? When using corosync-1.4.5, we encountered  shutdown corosync
> by using kill -3 failed several times. The latest one is because when
> issuing kill -3, corosync_exit_sem had not been initialized by sem_init(),
> so sem_post() in corosync_shutdown_request() failed to trigger
> corosync_exit_thread_handler() to work. The resolution I think is simply to
> call the sem_init() before we install signal handler. But as you say, if
> corosync-2.x has more stronger mechanism for shutdown, why not port it back
> to 1.4.x?
>
> On Feb 15, 2013 7:03 AM, "Steven Dake" <steven.dake@xxxxxxxxx> wrote:
>>
>>
>>
>> On Thu, Feb 14, 2013 at 1:23 PM, Brian J. Murrell
>> <brian.murrell@xxxxxxxxxxxxxxx> wrote:
>>>
>>> On EL6, at least, trying to stop corosync (kill -TERM) seems to fail
>>> quite frequently with corosync seemingly just not wanting to take heed
>>> of the signal and exit.  corosync-cfgtool -H doesn't seem to work either
>>> and I just end up killing it with a SIGKILL.
>>>
>> Shutdown has been a never-ending source of frustration for corosync, now
>> solved with the 2.x series :)
>>
>> The reason the TERM is not honored immediately is that Corosync wants to
>> shut down in an orderly fashion on a TERM by quiescing services and shutting
>> down cleanly with no pending messages.  Sometimes this is not possible
>> quickly because the network is flaky or blocked in some way (such as
>> iptables).
>>
>> I had thought we had sorted all this out for 1.4 series though, so if you
>> could provide more information on your corosync rpm version, that might be
>> helpful.
>>
>>
>>>
>>> Is a SIGKILL really the only way to deal with this problem?  Should this
>>> need be codified into the initscript?  i.e. try SIGTERM and then SIGKILL
>>> after a timeout?  What's a reasonable timeout for SIGTERM to have
>>> worked?
>>>
>>
>> sigterm should be honored by the corosync process rather then hacking
>> around with a sigkill.
>>
>>>
>>> Cheers,
>>> b.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss@xxxxxxxxxxxx
>>> http://lists.corosync.org/mailman/listinfo/discuss
>>
>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss@xxxxxxxxxxxx
>> http://lists.corosync.org/mailman/listinfo/discuss
>>
>
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux