More specifically, stopping corosync-notifyd results in all Pacemaker's connections to Corosync being terminated. Andreas: Did you test this on linux or solaris only? On Thu, Oct 11, 2012 at 11:45 PM, Grüninger, Andreas (LGL Extern) <Andreas.Grueninger@xxxxxxxxxx> wrote: > When I start > corosync-notifyd -f -l -s -m <MONITORINGSERVER> > and close it with CTRL-C, pacemaker make a shutdown. > Please see below for the details. > > I compiled the current master of corosync (tag 2.1.0) and the current master of pacemaker. > The OS is Solaris 11U7. > > Is this a feature or a bug? > In Solaris libqb must be patched to avoid errors. > Please see > https://lists.fedorahosted.org/pipermail/quarterback-devel/2012-September/000921.html "[PATCH] -ENOTCONN handled as error when client disconnects" > Maybe this patch should not deliver -ESHUTDOWN when a client disconnects. > IMHO this is the adaequate result. > > Andreas > > > On Thu, Oct 4, 2012 at 5:57 PM, Grüninger, Andreas (LGL Extern) <Andreas.Grueninger@xxxxxxxxxx> wrote: >>>> Is this an error or the desired result? >> >>>Based on the logs, pacemaker thinks corosync died. Did that happen? >>>If so there is not much pacemaker can do :-( >> >> And that is absolutely ok when corosync dies. >> Corosync does not die but is still healthy. >> It is corosync-notifyd which is started additionally to corosync as a separate process and which is finished with kill as daemon or with ctrl-c as foreground process. >> The job of corosync-notifyd is sending of SNMP traps. >> This is the functionality of crm_mon -C .. -S ... for pacemaker. >> >> So corosync-notifyd sends the wrong signal or pacemaker does a little bit too much. >> Pacemaker should just ignore this ending connection. > > All the Pacemaker daemons are being told, by Corosync itself, that their connections to Corosync are dead. > Its a little difficult to ignore that. > >> Is there a chance in pacemaker or should should this better solved in corosync/corosync-notifyd? > > It needs to be addressed in corosync/corosync-notifyd. > Corosync's CPG library is the one invoking our > cpg_connection_destroy() callback. > >> >> Andreas >> >> -----Ursprüngliche Nachricht----- >> Von: Andrew Beekhof [mailto:andrew@xxxxxxxxxxx] >> Gesendet: Mittwoch, 3. Oktober 2012 01:09 >> An: The Pacemaker cluster resource manager >> Betreff: Re: [Pacemaker] Exiting corosync-notifyd results in shutting >> downof pacemakerd >> >> On Wed, Oct 3, 2012 at 2:51 AM, Grüninger, Andreas (LGL Extern) <Andreas.Grueninger@xxxxxxxxxx> wrote: >>> I am currently investigating the monitoring of corosync/pacemaker with snmp. >>> crm_mon used with the OCF resource ClusterMon works as it should. >>> >>> But corosync-notifyd can't be used in our case. >>> I start corosync-notifyd in the foreground as follows >>> corosync-notifyd -f -l -s -m 10.50.235.1 >>> >>> When I stop the running corosync-notifyd with CTRL-C, pacemaker shuts down with the following entries in the logfile. >>> Is this an error or the desired result? >> >> Based on the logs, pacemaker thinks corosync died. Did that happen? >> If so there is not much pacemaker can do :-( >> >>> >>> .... >>> Oct 02 18:42:19 [27126] pacemakerd: error: cfg_connection_destroy: Connection destroyed >>> Oct 02 18:42:19 [27126] pacemakerd: notice: pcmk_shutdown_worker: Shuting down Pacemaker >>> Oct 02 18:42:19 [27126] pacemakerd: notice: stop_child: Stopping crmd: Sent -15 to process 27177 >>> Oct 02 18:42:19 [27126] pacemakerd: error: cpg_connection_destroy: Connection destroyed >>> Oct 02 18:42:19 [27177] crmd: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated >>> Oct 02 18:42:19 [27177] crmd: notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms >>> Oct 02 18:42:19 [27128] stonith-ng: error: pcmk_cpg_dispatch: Connection to the CPG API failed: 2 >>> Oct 02 18:42:19 [27177] crmd: info: do_shutdown_req: Sending shutdown request to zd-sol-s1-v61 >>> Oct 02 18:42:19 [27128] stonith-ng: error: stonith_peer_ais_destroy: AIS connection terminated >>> Oct 02 18:42:19 [27128] stonith-ng: info: stonith_shutdown: Terminating with 1 clients >>> Oct 02 18:42:19 [27130] attrd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: 2 >>> Oct 02 18:42:19 [27130] attrd: crit: attrd_ais_destroy: Lost connection to Corosync service! >>> Oct 02 18:42:19 [27130] attrd: notice: main: Exiting... >>> Oct 02 18:42:19 [27130] attrd: notice: main: Disconnecting client 81ffc38, pid=27177... >>> Oct 02 18:42:19 [27128] stonith-ng: info: qb_ipcs_us_withdraw: withdrawing server sockets >>> Oct 02 18:42:19 [27128] stonith-ng: info: crm_xml_cleanup: Cleaning up memory from libxml2 >>> Oct 02 18:42:19 [27130] attrd: error: attrd_cib_connection_destroy: Connection to the CIB terminated... >>> Oct 02 18:42:19 [27127] cib: error: pcmk_cpg_dispatch: Connection to the CPG API failed: 2 >>> Oct 02 18:42:19 [27127] cib: error: cib_ais_destroy: Corosync connection lost! Exiting. >>> Oct 02 18:42:19 [27129] lrmd: info: lrmd_ipc_destroy: LRMD client disconnecting 807e768 - name: crmd id: 1d659f61-d6e2-4ef3-f674-b9a8ba8029e8 >>> Oct 02 18:42:19 [27127] cib: info: terminate_cib: cib_ais_destroy: Exiting fast... >>> Oct 02 18:42:19 [27127] cib: info: qb_ipcs_us_withdraw: withdrawing server sockets >>> Oct 02 18:42:19 [27127] cib: info: qb_ipcs_us_withdraw: withdrawing server sockets >>> Oct 02 18:42:19 [27127] cib: info: qb_ipcs_us_withdraw: withdrawing server sockets >>> Oct 02 18:42:19 [27126] pacemakerd: error: pcmk_child_exit: Child process attrd exited (pid=27130, rc=1) >>> Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle >>> Oct 02 18:42:19 [27126] pacemakerd: error: pcmk_child_exit: Child process cib exited (pid=27127, rc=64) >>> Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle >>> Oct 02 18:42:19 [27126] pacemakerd: notice: pcmk_child_exit: Child process crmd terminated with signal 13 (pid=27177, core=0) >>> Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle >>> Oct 02 18:42:19 [27126] pacemakerd: notice: stop_child: Stopping pengine: Sent -15 to process 27131 >>> Oct 02 18:42:19 [27126] pacemakerd: info: pcmk_child_exit: Child process pengine exited (pid=27131, rc=0) >>> Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle >>> Oct 02 18:42:19 [27126] pacemakerd: notice: stop_child: Stopping lrmd: Sent -15 to process 27129 >>> Oct 02 18:42:19 [27129] lrmd: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated >>> Oct 02 18:42:19 [27129] lrmd: info: lrmd_shutdown: Terminating with 0 clients >>> Oct 02 18:42:19 [27129] lrmd: info: qb_ipcs_us_withdraw: withdrawing server sockets >>> Oct 02 18:42:19 [27126] pacemakerd: info: pcmk_child_exit: Child process lrmd exited (pid=27129, rc=0) >>> Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle >>> Oct 02 18:42:19 [27126] pacemakerd: notice: stop_child: Stopping stonith-ng: Sent -15 to process 27128 >>> Oct 02 18:42:19 [27126] pacemakerd: notice: pcmk_child_exit: Child process stonith-ng terminated with signal 11 (pid=27128, core=128) >>> Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle >>> Oct 02 18:42:19 [27126] pacemakerd: notice: pcmk_shutdown_worker: Shutdown complete >>> Oct 02 18:42:19 [27126] pacemakerd: info: qb_ipcs_us_withdraw: withdrawing server sockets >>> Oct 02 18:42:19 [27126] pacemakerd: info: main: Exiting pacemakerd >>> >>> Andreas >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@xxxxxxxxxxxxxxxxxxx >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@xxxxxxxxxxxxxxxxxxx >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@xxxxxxxxxxxxxxxxxxx >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@xxxxxxxxxxxxxxxxxxx http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss