Re: Unexpected problems with clvmd

Christine Caulfield <ccaulfie@xxxxxxxxxx> · Wed, 03 Dec 2008 13:52:37 +0000



Shaun Mccullagh wrote:
> Hi,
> 
> I tried to add another node to our 3 node cluster this morning.
> 
> Initially things went well; but I wanted to check the new node booted
> correctly.
> 
> After the second reboot clvmd failed to start up on the new node (called
> pan4):
> 
> [root@pan4 ~]# clvmd -d1 -T20
> CLVMD[8e1e8300]: Dec  3 14:24:09 CLVMD started
> CLVMD[8e1e8300]: Dec  3 14:24:09 Connected to CMAN
> CLVMD[8e1e8300]: Dec  3 14:24:12 CMAN initialisation complete  
> 
> Group_tool reports this output for clvmd on all four nodes in the
> cluster
> 
> dlm              1     clvmd       00010005 FAIL_START_WAIT
> dlm              1     clvmd       00010005 FAIL_ALL_STOPPED
> dlm              1     clvmd       00010005 FAIL_ALL_STOPPED
> dlm              1     clvmd       00000000 JOIN_STOP_WAIT
> 
> Otherwise the cluster is OK:
> 
> [root@brik3 ~]# clustat
> Cluster Status for mtv_gfs @ Wed Dec  3 14:38:26 2008
> Member Status: Quorate
> 
>  Member Name                                               ID   Status
>  ------ ----                                               ---- ------
>  pan4                                                          4 Online
>  pan5                                                          5 Online
>  nfs-pan                                                       6 Online
>  brik3-gfs                                                     7 Online,
> Local
> 
> [root@brik3 ~]# cman_tool status
> Version: 6.1.0
> Config Version: 4
> Cluster Name: mtv_gfs
> Cluster Id: 14067
> Cluster Member: Yes
> Cluster Generation: 172
> Membership state: Cluster-Member
> Nodes: 4
> Expected votes: 4
> Total votes: 4
> Quorum: 3  
> Active subsystems: 8
> Flags: Dirty 
> Ports Bound: 0 11  
> Node name: brik3-gfs
> Node ID: 7
> Multicast addresses: 239.192.54.42 
> Node addresses: 172.16.1.60 
> 
> It seems I have created a deadlock, what is the best way to fix this?
> 
> TIA
> 
>
The first thing is to check the fencing status, via group_tool and
syslog. If fencing hasn't completed then the DLM can't recover.
-- 

Chrissie

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster