Thanks for your help. Details are below quote.
David Teigland wrote:
On Thu, May 03, 2007 at 11:27:08AM +0200, Sebastian Walter wrote:
Sebastian Walter wrote:
Thanks for your help. These are /proc/cluster/services:
###master
Service Name GID LID State Code
Fence Domain: "default" 6 2 run -
[3 2 1]
DLM Lock Space: "clvmd" 5 3 join
S-6,20,3
[3 2 1]
### node1:
Service Name GID LID State Code
Fence Domain: "default" 6 2 run -
[3 2 1]
DLM Lock Space: "clvmd" 5 3 update
U-4,1,1
[2 3 1]
This says that the dlm is stuck in recovery on all the nodes.
Which version of the code are you using?
ccsd 1.07
cman_tool 1.0.11
fenced 1.32.25
clvmd 2.02.06, protocol 0.2.1
Has this happened more than once?
This happens every time.
Does the cluster have quorum? (cman_tool status)
Yes:
[root@xx ~]# cman_tool status
Protocol version: 5.0.1
Config version: 28
Cluster name: xx
Cluster ID: 338
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 3
Node name: xx.xx.xx.xx
Node ID: 1
Node addresses: xx.xx.xx.xx
What does /proc/cluster/dlm_debug show from all nodes?
[root@master ~]# cat /proc/cluster/dlm_debug
clvmd move flags 0,1,0 ids 0,3,0
clvmd move use event 3
clvmd recover event 3 (first)
clvmd add nodes
[root@compute-0-2 ~]# cat /proc/cluster/dlm_debug
clvmd move flags 0,1,0 ids 0,2,0
clvmd move use event 2
clvmd recover event 2 (first)
clvmd add nodes
clvmd total nodes 1
clvmd rebuild resource directory
clvmd rebuilt 0 resources
clvmd recover event 2 done
clvmd move flags 0,0,1 ids 0,2,2
clvmd process held requests
clvmd processed 0 requests
clvmd recover event 2 finished
clvmd move flags 1,0,0 ids 2,2,2
clvmd move flags 0,1,0 ids 2,3,2
clvmd move use event 3
clvmd recover event 3
clvmd add node 1
(I narrowed down the cluster to 2 nodes, same problem)
What are the dlm threads waiting on? (ps ax -o pid,stat,wchan,cmd | grep dlm)
[root@xx ~]# ps ax -o pid,stat,wchan,cmd|grep dlm
28397 S< dlm_as [dlm_astd]
28398 S< dlm_re [dlm_recvd]
28399 S< dlm_se [dlm_sendd]
28400 S< dlm_wa [dlm_recoverd]
[root@compute-0-2 ~]# ps ax -o pid,stat,wchan,cmd|grep dlm
4930 S< dlm_as [dlm_astd]
4931 S< dlm_re [dlm_recvd]
4932 S< dlm_se [dlm_sendd]
4933 S< dlm_wa [dlm_recoverd]
Sebastian
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster