Re: clvmd hangs

Sebastian Walter <sebastian.walter@xxxxxxxxxxxx> · Wed, 16 May 2007 17:30:31 +0200

Thanks for your help. Details are below quote.

David Teigland wrote:
On Thu, May 03, 2007 at 11:27:08AM +0200, Sebastian Walter wrote:

Sebastian Walter wrote:

Thanks for your help. These are /proc/cluster/services:

###master
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           6   2 run       -
[3 2 1]

DLM Lock Space:  "clvmd"                             5   3 join      
S-6,20,3
[3 2 1]

### node1:
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           6   2 run       -
[3 2 1]

DLM Lock Space:  "clvmd"                             5   3 update    
U-4,1,1
[2 3 1]

This says that the dlm is stuck in recovery on all the nodes.
Which version of the code are you using?

ccsd 1.07
cman_tool 1.0.11
fenced 1.32.25
clvmd 2.02.06, protocol 0.2.1
Has this happened more than once?

This happens every time.
Does the cluster have quorum? (cman_tool status)

Yes:
[root@xx ~]# cman_tool status
Protocol version: 5.0.1
Config version: 28
Cluster name: xx
Cluster ID: 338
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1  
Active subsystems: 3
Node name: xx.xx.xx.xx
Node ID: 1
Node addresses: xx.xx.xx.xx

What does /proc/cluster/dlm_debug show from all nodes?

[root@master ~]# cat /proc/cluster/dlm_debug
clvmd move flags 0,1,0 ids 0,3,0
clvmd move use event 3
clvmd recover event 3 (first)
clvmd add nodes

[root@compute-0-2 ~]# cat /proc/cluster/dlm_debug
clvmd move flags 0,1,0 ids 0,2,0
clvmd move use event 2
clvmd recover event 2 (first)
clvmd add nodes
clvmd total nodes 1
clvmd rebuild resource directory
clvmd rebuilt 0 resources
clvmd recover event 2 done
clvmd move flags 0,0,1 ids 0,2,2
clvmd process held requests
clvmd processed 0 requests
clvmd recover event 2 finished
clvmd move flags 1,0,0 ids 2,2,2
clvmd move flags 0,1,0 ids 2,3,2
clvmd move use event 3
clvmd recover event 3
clvmd add node 1

(I narrowed down the cluster to 2 nodes, same problem)
What are the dlm threads waiting on? (ps ax -o pid,stat,wchan,cmd | grep dlm)

[root@xx ~]# ps ax -o pid,stat,wchan,cmd|grep dlm
28397 S<   dlm_as [dlm_astd]
28398 S<   dlm_re [dlm_recvd]
28399 S<   dlm_se [dlm_sendd]
28400 S<   dlm_wa [dlm_recoverd]

[root@compute-0-2 ~]# ps ax -o pid,stat,wchan,cmd|grep dlm
4930 S<   dlm_as [dlm_astd]
4931 S<   dlm_re [dlm_recvd]
4932 S<   dlm_se [dlm_sendd]
4933 S<   dlm_wa [dlm_recoverd]

Sebastian

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster