Shaun Mccullagh wrote: > Hi, > > I tried to add another node to our 3 node cluster this morning. > > Initially things went well; but I wanted to check the new node booted > correctly. > > After the second reboot clvmd failed to start up on the new node (called > pan4): > > [root@pan4 ~]# clvmd -d1 -T20 > CLVMD[8e1e8300]: Dec 3 14:24:09 CLVMD started > CLVMD[8e1e8300]: Dec 3 14:24:09 Connected to CMAN > CLVMD[8e1e8300]: Dec 3 14:24:12 CMAN initialisation complete > > Group_tool reports this output for clvmd on all four nodes in the > cluster > > dlm 1 clvmd 00010005 FAIL_START_WAIT > dlm 1 clvmd 00010005 FAIL_ALL_STOPPED > dlm 1 clvmd 00010005 FAIL_ALL_STOPPED > dlm 1 clvmd 00000000 JOIN_STOP_WAIT > > Otherwise the cluster is OK: > > [root@brik3 ~]# clustat > Cluster Status for mtv_gfs @ Wed Dec 3 14:38:26 2008 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > pan4 4 Online > pan5 5 Online > nfs-pan 6 Online > brik3-gfs 7 Online, > Local > > [root@brik3 ~]# cman_tool status > Version: 6.1.0 > Config Version: 4 > Cluster Name: mtv_gfs > Cluster Id: 14067 > Cluster Member: Yes > Cluster Generation: 172 > Membership state: Cluster-Member > Nodes: 4 > Expected votes: 4 > Total votes: 4 > Quorum: 3 > Active subsystems: 8 > Flags: Dirty > Ports Bound: 0 11 > Node name: brik3-gfs > Node ID: 7 > Multicast addresses: 239.192.54.42 > Node addresses: 172.16.1.60 > > It seems I have created a deadlock, what is the best way to fix this? > > TIA > > The first thing is to check the fencing status, via group_tool and syslog. If fencing hasn't completed then the DLM can't recover. -- Chrissie -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster