here's the stack of error of when that happened, all I was doing is
playing with the nodes and placing them in standby, then moving them
back to online. When it happened I was not working on them, I was away
from my desk (no interaction at the crash moment)
here's my corosync.log
Jun 07 10:47:57 cluster_02 cib: [29308]: debug: cib_process_xpath:
Processing cib_query op for
//cib/configuration/nodes//node[@id='cluster_02']//instance_attributes//nvpair[@name='standby']
(/cib/configuration/nodes/node[2]/instance_attributes/nvpair)
Jun 07 10:47:57 corosync [TOTEM ] mcasted message added to pending queue
Jun 07 10:47:57 corosync [TOTEM ] Delivering 62 to 63
Jun 07 10:47:57 corosync [TOTEM ] Delivering MCAST message with seq 63
to pending delivery queue
Jun 07 10:47:57 corosync [TOTEM ] Received ringid(10.10.8.17:1684) seq 63
Jun 07 10:47:57 corosync [TOTEM ] Received ringid(10.10.8.17:1684) seq 64
Jun 07 10:47:57 corosync [TOTEM ] Delivering 63 to 64
Jun 07 10:47:57 corosync [TOTEM ] Delivering MCAST message with seq 64
to pending delivery queue
Jun 07 10:47:57 corosync [TOTEM ] Received ringid(10.10.8.17:1684) seq 65
Jun 07 10:47:57 corosync [TOTEM ] Delivering 64 to 65
Jun 07 10:47:57 corosync [TOTEM ] Delivering MCAST message with seq 65
to pending delivery queue
Jun 07 10:47:57 corosync [TOTEM ] releasing messages up to and including 63
Jun 07 10:47:57 corosync [TOTEM ] releasing messages up to and including 65
Jun 07 10:53:12 cluster_02 cib: [29308]: info: cib_stats: Processed 5
operations (0.00us average, 0% utilization) in the last 10min
Jun 07 10:53:12 cluster_02 cib: [29308]: debug: cib_stats: Detail: 69
operations (0ms total) (63 local, 31 updates, 0 failures, 0 timeouts, 0
bad connects)
Jun 07 11:00:42 cluster_02 cib: [29308]: debug: cib_process_xpath:
Processing cib_query op for
//cib/configuration/nodes//node[@id='cluster_02']//instance_attributes//nvpair[@name='standby']
(/cib/configuration/nodes/node[2]/instance_attributes/nvpair)
Jun 07 11:00:42 corosync [TOTEM ] mcasted message added to pending queue
Jun 07 11:00:42 corosync [TOTEM ] Delivering 65 to 66
Jun 07 11:00:42 corosync [TOTEM ] Delivering MCAST message with seq 66
to pending delivery queue
Jun 07 11:00:42 corosync [TOTEM ] Received ringid(10.10.8.17:1684) seq 66
Jun 07 11:00:42 corosync [TOTEM ] releasing messages up to and including 66
Jun 07 11:00:42 corosync [TOTEM ] Received ringid(10.10.8.17:1684) seq 67
Jun 07 11:00:42 corosync [TOTEM ] Delivering 66 to 67
Jun 07 11:00:42 corosync [TOTEM ] Delivering MCAST message with seq 67
to pending delivery queue
Jun 07 11:00:42 corosync [TOTEM ] Received ringid(10.10.8.17:1684) seq 68
Jun 07 11:00:42 corosync [TOTEM ] Delivering 67 to 68
Jun 07 11:00:42 corosync [TOTEM ] Delivering MCAST message with seq 68
to pending delivery queue
Jun 07 11:00:42 corosync [TOTEM ] releasing messages up to and including 68
Jun 07 11:01:01 cluster_02 stonith-ng: [29307]: ERROR:
stonith_peer_ais_destroy: AIS connection terminated
Jun 07 11:01:01 cluster_02 attrd: [29311]: CRIT: attrd_ais_destroy: Lost
connection to OpenAIS service!
Jun 07 11:01:01 cluster_02 attrd: [29311]: notice: main: Exiting...
Jun 07 11:01:01 cluster_02 attrd: [29311]: debug: cib_native_signoff:
Signing out of the CIB Service
Jun 07 11:01:01 cluster_02 attrd: [29311]: ERROR:
attrd_cib_connection_destroy: Connection to the CIB terminated...
Jun 07 11:01:01 cluster_02 cib: [29308]: ERROR: cib_ais_destroy: AIS
connection terminated
Jun 07 11:01:01 corosync [CPG ] exit_fn for conn=0x8d01e60
Jun 07 11:01:01 corosync [pcmk ] info: pcmk_ipc_exit: Client stonith-ng
(conn=0x8d06038, async-conn=0x8d06038) left
Jun 07 11:01:01 corosync [pcmk ] info: pcmk_ipc_exit: Client attrd
(conn=0x8d0a210, async-conn=0x8d0a210) left
Jun 07 11:01:01 corosync [TOTEM ] mcasted message added to pending queue
Jun 07 11:01:01 corosync [TOTEM ] Delivering 68 to 69
Jun 07 11:01:01 corosync [TOTEM ] Delivering MCAST message with seq 69
to pending delivery queue
Jun 07 11:01:01 corosync [CPG ] got procleave message from cluster
node 302516746
Jun 07 11:01:01 corosync [TOTEM ] Received ringid(10.10.8.17:1684) seq 69
Jun 07 11:01:01 corosync [pcmk ] info: pcmk_ipc_exit: Client cib
(conn=0x8d0e3e8, async-conn=0x8d0e3e8) left
Jun 07 11:01:01 corosync [TOTEM ] Received ringid(10.10.8.17:1684) seq 6a
Jun 07 11:01:01 corosync [TOTEM ] Delivering 69 to 6a
Jun 07 11:01:01 corosync [TOTEM ] Delivering MCAST message with seq 6a
to pending delivery queue
Jun 07 11:01:01 corosync [TOTEM ] releasing messages up to and including 69
Jun 07 11:01:01 corosync [TOTEM ] releasing messages up to and including 6a
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: xmlfromIPC: Peer
disconnected
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: cib_native_msgready:
Lost connection to the CIB service [29308].
Jun 07 11:01:01 cluster_02 crmd: [29313]: CRIT: cib_native_dispatch:
Lost connection to the CIB service [29308/callback].
Jun 07 11:01:01 cluster_02 crmd: [29313]: CRIT: cib_native_dispatch:
Lost connection to the CIB service [29308/command].
Jun 07 11:01:01 cluster_02 crmd: [29313]: ERROR:
crmd_cib_connection_destroy: Connection to the CIB terminated...
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: crmd_ais_destroy:
connection closed
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: xmlfromIPC: Peer
disconnected
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: stonith_msgready: Lost
connection to the STONITH service [29307].
Jun 07 11:01:01 cluster_02 crmd: [29313]: CRIT:
stonith_dispatch_internal: Lost connection to the STONITH service
[29307/callback].
Jun 07 11:01:01 cluster_02 crmd: [29313]: CRIT:
stonith_dispatch_internal: Lost connection to the STONITH service
[29307/command].
Jun 07 11:01:01 cluster_02 crmd: [29313]: CRIT:
tengine_stonith_connection_destroy: Fencing daemon connection failed
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: s_crmd_fsa: Processing
I_ERROR: [ state=S_NOT_DC cause=C_FSA_INTERNAL
origin=crmd_cib_connection_destroy ]
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_ERROR
Jun 07 11:01:01 cluster_02 crmd: [29313]: ERROR: do_log: FSA: Input
I_ERROR from crmd_cib_connection_destroy() received in state S_NOT_DC
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: do_state_transition:
State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR
cause=C_FSA_INTERNAL origin=crmd_cib_connection_destroy ]
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_DC_TIMER_STOP
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_INTEGRATE_TIMER_STOP
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_FINALIZE_TIMER_STOP
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_RECOVER
Jun 07 11:01:01 cluster_02 crmd: [29313]: ERROR: do_recover: Action
A_RECOVER (0000000001000000) not supported
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: s_crmd_fsa: Processing
I_TERMINATE: [ state=S_RECOVERY cause=C_FSA_INTERNAL origin=do_recover ]
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_ERROR
Jun 07 11:01:01 cluster_02 crmd: [29313]: ERROR: do_log: FSA: Input
I_TERMINATE from do_recover() received in state S_RECOVERY
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: do_state_transition:
State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE
cause=C_FSA_INTERNAL origin=do_recover ]
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_DC_TIMER_STOP
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_INTEGRATE_TIMER_STOP
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_FINALIZE_TIMER_STOP
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_SHUTDOWN
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: do_shutdown:
Disconnecting STONITH...
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: stonith_api_signoff:
Signing out of the STONITH Service
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_LRM_DISCONNECT
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: verify_stopped:
Checking for active resources before exit
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: do_lrm_control:
Disconnected from the LRM
Jun 07 11:01:01 cluster_02 lrmd: [29310]: debug: on_receive_cmd: the IPC
to client [pid:29313] disconnected.
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_CCM_DISCONNECT
Jun 07 11:01:01 cluster_02 lrmd: [29310]: debug: unregister_client:
client crmd [pid:29313] is unregistered
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_HA_DISCONNECT
Jun 07 11:01:01 cluster_02 crmd: [29313]: notice:
terminate_ais_connection: Disconnecting from AIS
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: do_ha_control:
Disconnected from OpenAIS
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_CIB_STOP
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: do_cib_control:
Disconnecting CIB
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug:
cib_client_del_notify_callback: Removing callback for cib_diff_notify events
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_STOP
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: do_fsa_action:
actions:trace: // A_EXIT_0
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: verify_stopped:
Checking for active resources before exit
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: do_exit: Performing
A_EXIT_0 - gracefully exiting the CRMd
Jun 07 11:01:01 cluster_02 crmd: [29313]: ERROR: do_exit: Could not
recover from internal error
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: free_mem: Dropping
I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: free_mem: Number of
connected clients: 0
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: free_mem: Partial
destroy: TE
Jun 07 11:01:01 cluster_02 crmd: [29313]: debug: free_mem: Partial
destroy: PE
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: crm_xml_cleanup:
Cleaning up memory from libxml2
Jun 07 11:01:01 cluster_02 crmd: [29313]: info: do_exit: [crmd] stopped (2)
I am aware that at this moment my stonith is not configured, reason is
that I am not putting any configuration, simply running 2 servers with
no load or resources to test the actual cluster. The slave machine dies.
this is my corosync.conf
compatibility: whitetank
totem {
version: 2
token: 3000
token_retransmits_before_loss_const: 10
join: 60
consensus: 3600
vsftype: none
max_messages: 20
clear_node_high_bit: yes
secauth: off
threads: 0
rrp_mode: none
interface {
ringnumber: 0
bindnetaddr: 10.10.0.0
mcastaddr: 226.18.1.1
mcastport: 6006
}
}
service {
ver: 1
name: pacemaker
}
aisexec {
user: root
group: root
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: on
timestamp: on
logger_subsys {
subsys: AMF
debug: on
}
}
amf {
mode: disabled
}
Thanks in advance!
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss