Hello I have two nodes running RHEL 6 beta 2 and configured corosync as follows. Both nodes have access to a SAN disk. The disk is partitioned into /dev/sdb1 for SBD STONITH and /dev/sdb2 for data. /dev/sdb2 has a GFS2 filesystem on the LVM (vg01/lv00). For the configuration, I followed the Cluster from Scratch PDF from clusterlabs. As soon as I start the two nodes, one of them gets immediately fenced and shut down. I see in the logs, that the fenced node tries to mount the FS when he gets shot down. I have no clue why this happens. Can anyone give me a hint how to fix my cluster? Configuration: [root@pcmknode-1 ~]# crm configure show node pcmknode-1 node pcmknode-2 primitive WebFS ocf:heartbeat:Filesystem \ params device="/dev/vg01/lv00" directory="/data_1" fstype="gfs2" primitive dlm ocf:pacemaker:controld \ params configdir="/config" \ op monitor interval="120s" primitive gfs-control ocf:pacemaker:controld \ params daemon="gfs_controld.pcmk" args="-g 0" \ op monitor interval="120s" primitive resSBD stonith:external/sbd \ params sbd_device="/dev/sdb1" clone WebFSClone WebFS clone dlm-clone dlm \ meta interleave="true" target-role="Started" clone gfs-clone gfs-control \ meta interleave="true" target-role="Started" location cli-prefer-WebFS WebFSClone \ rule $id="cli-prefer-rule-WebFS" inf: #uname eq pcmknode-1 and date lt "2010-07-27 21:53:10Z" colocation WebFS-with-gfs-control inf: WebFSClone gfs-clone colocation gfs-with-dlm inf: gfs-clone dlm-clone order start-WebFS-after-gfs-control inf: gfs-clone WebFSClone order start-gfs-after-dlm inf: dlm-clone gfs-clone property $id="cib-bootstrap-options" \ dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="true" \ stonith-timeout="30s" \ no-quorum-policy="ignore" These are the logs: pcmknode-1: /var/log/messages ~ snip ~ Jul 28 00:46:32 pcmknode-1 pengine: [2628]: info: native_color: Resource WebFS:0 cannot run anywhere Jul 28 00:46:32 pcmknode-1 pengine: [2628]: WARN: custom_action: Action dlm:0_stop_0 on pcmknode-2 is unrunnable (offline) Jul 28 00:46:32 pcmknode-1 pengine: [2628]: WARN: custom_action: Marking node pcmknode-2 unclean Jul 28 00:46:32 pcmknode-1 pengine: [2628]: WARN: custom_action: Action gfs-control:0_stop_0 on pcmknode-2 is unrunnable (offline) Jul 28 00:46:32 pcmknode-1 pengine: [2628]: WARN: custom_action: Marking node pcmknode-2 unclean Jul 28 00:46:32 pcmknode-1 pengine: [2628]: WARN: stage6: Scheduling Node pcmknode-2 for STONITH Jul 28 00:46:32 pcmknode-1 pengine: [2628]: info: native_stop_constraints: dlm:0_stop_0 is implicit after pcmknode-2 is fenced Jul 28 00:46:32 pcmknode-1 pengine: [2628]: info: native_stop_constraints: gfs-control:0_stop_0 is implicit after pcmknode-2 is fenced Jul 28 00:46:32 pcmknode-1 pengine: [2628]: info: find_compatible_child: Colocating gfs-control:1 with dlm:1 on pcmknode-1 Jul 28 00:46:32 pcmknode-1 pengine: [2628]: notice: clone_rsc_order_lh: Interleaving dlm:1 with gfs-control:1 Jul 28 00:46:32 pcmknode-1 crmd: [2629]: info: te_fence_node: Executing reboot fencing operation (30) on pcmknode-2 (timeout=30000) Jul 28 00:46:32 pcmknode-1 crmd: [2629]: info: te_rsc_command: Initiating action 22: stop WebFS:1_stop_0 on pcmknode-1 (local) Jul 28 00:46:32 pcmknode-1 crmd: [2629]: info: do_lrm_rsc_op: Performing key=22:3:0:2419bb70-dce6-4a0e-b649-fae2b0f21b8d op=WebFS:1_stop_0 ) Jul 28 00:46:32 pcmknode-1 pengine: [2628]: info: find_compatible_child: Colocating dlm:0 with gfs-control:0 on pcmknode-2 Jul 28 00:46:32 pcmknode-1 pengine: [2628]: notice: clone_rsc_order_lh: Interleaving gfs-control:0 with dlm:0 Jul 28 00:46:32 pcmknode-1 pengine: [2628]: info: find_compatible_child: Colocating dlm:1 with gfs-control:1 on pcmknode-1 Jul 28 00:46:32 pcmknode-1 pengine: [2628]: notice: clone_rsc_order_lh: Interleaving gfs-control:1 with dlm:1 Jul 28 00:46:32 pcmknode-1 pengine: [2628]: info: find_compatible_child: Colocating WebFS:1 with gfs-control:1 on pcmknode-1 Jul 28 00:46:32 pcmknode-1 pengine: [2628]: notice: clone_rsc_order_lh: Interleaving gfs-control:1 with WebFS:1 Jul 28 00:46:32 pcmknode-1 pengine: [2628]: notice: LogActions: Leave resource resSBD (Started pcmknode-1) Jul 28 00:46:32 pcmknode-1 pengine: [2628]: notice: LogActions: Stop resource dlm:0 (pcmknode-2) Jul 28 00:46:32 pcmknode-1 pengine: [2628]: notice: LogActions: Leave resource dlm:1 (Started pcmknode-1) Jul 28 00:46:32 pcmknode-1 pengine: [2628]: notice: LogActions: Stop resource gfs-control:0 (pcmknode-2) Jul 28 00:46:32 pcmknode-1 cib: [2900]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-23.raw Jul 28 00:46:32 pcmknode-1 pengine: [2628]: notice: LogActions: Leave resource gfs-control:1 (Started pcmknode-1) Jul 28 00:46:32 pcmknode-1 pengine: [2628]: notice: LogActions: Leave resource WebFS:0 (Stopped) Jul 28 00:46:32 pcmknode-1 pengine: [2628]: notice: LogActions: Restart resource WebFS:1 (Started pcmknode-1) Jul 28 00:46:32 pcmknode-1 pengine: [2628]: WARN: process_pe_message: Transition 3: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/pengine/pe-warn-4.bz2 Jul 28 00:46:32 pcmknode-1 pengine: [2628]: info: process_pe_message: Configuration WARNINGs found during PE processing. Please run "crm_verify -L" to identify issues. Jul 28 00:46:32 pcmknode-1 cib: [2900]: info: write_cib_contents: Wrote version 0.127.0 of the CIB to disk (digest: af7f98fa70bd2ef644e8e70d6f2ceea9) Jul 28 00:46:32 pcmknode-1 cib: [2900]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.vx6ONO (digest: /var/lib/heartbeat/crm/cib.kGkBp3) Jul 28 00:46:32 pcmknode-1 Filesystem[2902]: INFO: Running stop for /dev/vg01/lv00 on /data_1 Jul 28 00:46:32 pcmknode-1 Filesystem[2902]: INFO: Trying to unmount /data_1 Jul 28 00:46:35 pcmknode-1 stonith-ng: [2624]: ERROR: remote_op_query_timeout: Query 8f1eeecf-4832-4430-b8e8-41a645675c58 for pcmknode-2 timed out Jul 28 00:46:35 pcmknode-1 stonith-ng: [2624]: ERROR: remote_op_timeout: Action reboot (8f1eeecf-4832-4430-b8e8-41a645675c58) for pcmknode-2 timed out Jul 28 00:46:35 pcmknode-1 stonith-ng: [2624]: info: remote_op_done: Notifing clients of 8f1eeecf-4832-4430-b8e8-41a645675c58 (reboot of pcmknode-2 from 540c61a4-d351-40c7-aa60-efd445097180 by (null)): 0, rc=-7 Jul 28 00:46:35 pcmknode-1 stonith-ng: [2624]: info: stonith_notify_client: Sending st_fence-notification to client 2629/cc57856c-5357-4343-95a9-712771f711ae Jul 28 00:46:35 pcmknode-1 crmd: [2629]: info: log_data_element: tengine_stonith_callback: StonithOp <remote-op state="0" st_target="pcmknode-2" st_op="reboot" /> Jul 28 00:46:35 pcmknode-1 crmd: [2629]: info: tengine_stonith_callback: Stonith operation 2/30:3:0:2419bb70-dce6-4a0e-b649-fae2b0f21b8d: Operation timed out (-7) Jul 28 00:46:35 pcmknode-1 crmd: [2629]: ERROR: tengine_stonith_callback: Stonith of pcmknode-2 failed (-7)... aborting transition. Jul 28 00:46:35 pcmknode-1 crmd: [2629]: info: abort_transition_graph: tengine_stonith_callback:402 - Triggered transition abort (complete=0) : Stonith failed Jul 28 00:46:35 pcmknode-1 crmd: [2629]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000 Jul 28 00:46:35 pcmknode-1 crmd: [2629]: info: update_abort_priority: Abort action done superceeded by restart Jul 28 00:46:35 pcmknode-1 crmd: [2629]: info: tengine_stonith_notify: Peer pcmknode-2 was terminated (reboot) by (null) for pcmknode-1 (ref=8f1eeecf-4832-4430-b8e8-41a645675c58): Operation timed out ~ snip ~ pcmknode-2: /var/log/messages ~ snip ~ Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Received ringid(192.168.1.186:620) seq 91 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Delivering 90 to 91 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Delivering MCAST message with seq 91 to pending delivery queue Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Received ringid(192.168.1.186:620) seq 92 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Delivering 91 to 92 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Delivering MCAST message with seq 92 to pending delivery queue Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] mcasted message added to pending queue Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Delivering 92 to 93 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Delivering MCAST message with seq 93 to pending delivery queue Jul 28 00:46:29 pcmknode-2 corosync[2535]: [CPG ] got procjoin message from cluster node -1147763583 Jul 28 00:46:29 pcmknode-2 cib: [2542]: debug: cib_process_xpath: cib_query: //nvpar[@name='terminate'] does not exist Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Received ringid(192.168.1.186:620) seq 93 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] releasing messages up to and including 92 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [CPG ] got mcast request on 0x1b072a0 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Received ringid(192.168.1.186:620) seq 94 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Delivering 93 to 94 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Delivering MCAST message with seq 94 to pending delivery queue Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] mcasted message added to pending queue Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] releasing messages up to and including 93 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Delivering 94 to 95 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Delivering MCAST message with seq 95 to pending delivery queue Jul 28 00:46:29 pcmknode-2 kernel: : dlm: got connection from -1164540799 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] Received ringid(192.168.1.186:620) seq 95 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] releasing messages up to and including 94 Jul 28 00:46:29 pcmknode-2 corosync[2535]: [TOTEM ] releasing messages up to and including 95 Jul 28 00:46:29 pcmknode-2 kernel: GFS2: fsid=pcmknode:data1s.0: Joined cluster. Now mounting FS... -- that was the last message in the log. So, how can I fix my cluster? What exactly is the problem? Thanks, Benedikt -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster