Hi all, I have a very simple two node cluster, but every time I restart a node, the cluster falls apart and clvmd doesn't start. I get the error message $ sudo /etc/init.d/clvm restart Deactivating VG ::. Stopping Cluster LVM Daemon: clvm. Starting Cluster LVM Daemon: clvmclvmd startup timed out And life stops here, I don't get the prompt back, even SIGINT dosen't work, however I can put the script into background. All this after a fresh restart of cman: $ sudo /etc/init.d/cman restart Stopping cluster: Leaving fence domain... [ OK ] Stopping gfs_controld... [ OK ] Stopping dlm_controld... [ OK ] Stopping fenced... [ OK ] Stopping cman... [ OK ] Unloading kernel modules... [ OK ] Unmounting configfs... [ OK ] Starting cluster: Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] Waiting for quorum... [ OK ] Starting fenced... [ OK ] Starting dlm_controld... [ OK ] Starting gfs_controld... [ OK ] Unfencing self... [ OK ] Joining fence domain... [ OK ] Everything looks okay here, however the return status of the init script is 1. Do you have any idea what the problem could be? Last lines of syslog: Apr 21 17:15:17 iscsigw2 corosync[1828]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Apr 21 17:15:17 iscsigw2 corosync[1828]: [CMAN ] quorum regained, resuming activity Apr 21 17:15:17 iscsigw2 corosync[1828]: [QUORUM] This node is within the primary component and will provide service. Apr 21 17:15:17 iscsigw2 corosync[1828]: [QUORUM] Members[2]: 1 2 Apr 21 17:15:17 iscsigw2 corosync[1828]: [QUORUM] Members[2]: 1 2 Apr 21 17:15:17 iscsigw2 corosync[1828]: [MAIN ] Completed service synchronization, ready to provide service. Apr 21 17:15:19 iscsigw2 fenced[1880]: fenced 3.0.12 started Apr 21 17:15:19 iscsigw2 dlm_controld[1905]: dlm_controld 3.0.12 started Apr 21 17:15:20 iscsigw2 gfs_controld[1950]: gfs_controld 3.0.12 started Apr 21 17:15:35 iscsigw2 kernel: [ 52.774694] dlm: Using TCP for communications Additional info, while the clvm init script is backgrounded: $ sudo cman_tool status Version: 6.2.0 Config Version: 6 Cluster Name: iscsigw Cluster Id: 13649 Cluster Member: Yes Cluster Generation: 288 Membership state: Cluster-Member Nodes: 2 Expected votes: 2 Total votes: 2 Node votes: 1 Quorum: 2 Active subsystems: 8 Flags: Ports Bound: 0 11 Node name: iscsigw2 Node ID: 2 Multicast addresses: 239.192.53.134 Node addresses: 10.0.0.2 $ sudo cman_tool services fence domain member count 2 victim count 0 victim now 0 master nodeid 1 wait state none members 1 2 dlm lockspaces name clvmd id 0x4104eefa flags 0x00000015 need_plock,kern_stop,join change member 0 joined 0 remove 0 failed 0 seq 0,0 members new change member 2 joined 1 remove 0 failed 0 seq 1,1 new status wait_messages 1 wait_condition 0 new members 1 2 The only was I found to get out of this situation is to reboot a node. The shutdown process stops when it tries to shut the VGs down, from there on only a hard reset helps. How could I stabilize this cluster so that I can reboot a node without worrying if the cluster suite will start up correctly or not? Thanks, -- cc -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster