On 18/04/11 15:49, Terry wrote:
On Mon, Apr 18, 2011 at 9:26 AM, Christine Caulfield
<ccaulfie@xxxxxxxxxx> wrote:
On 18/04/11 15:11, Terry wrote:
On Mon, Apr 18, 2011 at 8:57 AM, Christine Caulfield
<ccaulfie@xxxxxxxxxx> wrote:
On 18/04/11 14:38, Terry wrote:
On Mon, Apr 18, 2011 at 3:48 AM, Christine Caulfield
<ccaulfie@xxxxxxxxxx> wrote:
On 17/04/11 21:52, Terry wrote:
As a result of a strange situation where our licensing for storage
dropped off, I need to join a centos 5.6 node to a now single node
cluster. I got it joined to the cluster but I am having issues with
CLVMD. Any lvm operations on both boxes hang. For example, vgscan.
I have increased debugging and I don't see any logs. The VGs aren't
being populated in /dev/mapper. This WAS working right after I joined
it to the cluster and now it's not for some unknown reason. Not sure
where to take this at this point. I did find one weird startup log
that I am not sure what it means yet:
[root@omadvnfs01a ~]# dmesg | grep dlm
dlm: no local IP address has been set
dlm: cannot start dlm lowcomms -107
dlm: Using TCP for communications
dlm: connecting to 2
That message usually means that dlm_controld has failed to start. Try
starting the cman daemons (groupd, dlm_controld) manually with the -D
switch
and read the output which might give some clues to why it's not
working.
Chrissie
Hi Chrissie,
I thought of that but I see dlm started on both nodes. See right below.
[root@omadvnfs01a ~]# ps xauwwww | grep dlm
root 5476 0.0 0.0 24736 760 ? Ss 15:34 0:00
/sbin/dlm_controld
root 5502 0.0 0.0 0 0 ? S< 15:34 0:00
Well, that's encouraging in a way! But it's evidently not started fully
or
the DLM itself would be working. So I still recommend starting it with -D
to
see how far it gets.
Chrissie
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
I think we had posts cross. Here's my latest:
Ok, started all the CMAN elements manually as you suggested. I
started them in order as in the init script. Here's the only error
that I see. I can post the other debug messages if you think they'd
be useful but this is the only one that stuck out to me.
[root@omadvnfs01a ~]# /sbin/dlm_controld -D
1303134840 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
1303134840 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
1303134840 set_ccs_options 480
1303134840 cman: node 2 added
1303134840 set_configfs_node 2 10.198.1.111 local 0
1303134840 cman: node 3 added
1303134840 set_configfs_node 3 10.198.1.110 local 1
Can I see the whole set please ? It looks like dlm_controld might be stalled
registering with groupd.
Chrissie
--
Here you go. Thank you very much for the help. Each daemon's output
that I started is below.
[root@omadvnfs01a log]# /sbin/ccsd -n
Starting ccsd 2.0.115:
Built: Mar 6 2011 00:47:03
Copyright (C) Red Hat, Inc. 2004 All rights reserved.
No Daemon:: SET
cluster.conf (cluster name = omadvnfs01, version = 71) found.
Remote copy of cluster.conf is from quorate node.
Local version # : 71
Remote version #: 71
Remote copy of cluster.conf is from quorate node.
Local version # : 71
Remote version #: 71
Remote copy of cluster.conf is from quorate node.
Local version # : 71
Remote version #: 71
Remote copy of cluster.conf is from quorate node.
Local version # : 71
Remote version #: 71
Initial status:: Quorate
[root@omadvnfs01a ~]# /sbin/fenced -D
1303134822 cman: node 2 added
1303134822 cman: node 3 added
1303134822 our_nodeid 3 our_name omadvnfs01a.sec.jel.lc
1303134822 listen 4 member 5 groupd 7
1303134861 client 3: join default
1303134861 delay post_join 3s post_fail 0s
1303134861 added 2 nodes from ccs
1303134861 setid default 65537
1303134861 start default 1 members 2 3
1303134861 do_recovery stop 0 start 1 finish 0
1303134861 finish default 1
[root@omadvnfs01a ~]# /sbin/dlm_controld -D
1303134840 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
1303134840 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
1303134840 set_ccs_options 480
1303134840 cman: node 2 added
1303134840 set_configfs_node 2 10.198.1.111 local 0
1303134840 cman: node 3 added
1303134840 set_configfs_node 3 10.198.1.110 local 1
[root@omadvnfs01a ~]# /sbin/groupd -D
1303134809 cman: our nodeid 3 name omadvnfs01a.sec.jel.lc quorum 1
1303134809 setup_cpg groupd_handle 6b8b456700000000
1303134809 groupd confchg total 2 left 0 joined 1
1303134809 send_version nodeid 3 cluster 2 mode 2 compat 1
1303134822 client connection 3
1303134822 got client 3 setup
1303134822 setup fence 0
1303134840 client connection 4
1303134840 got client 4 setup
1303134840 setup dlm 1
1303134853 client connection 5
1303134853 got client 5 setup
1303134853 setup gfs 2
1303134861 got client 3 join
1303134861 0:default got join
1303134861 0:default is cpg client 6 name 0_default handle 6633487300000001
1303134861 0:default cpg_join ok
1303134861 0:default waiting for first cpg event
1303134861 client connection 7
1303134861 0:default waiting for first cpg event
1303134861 got client 7 get_group
1303134861 0:default waiting for first cpg event
1303134861 0:default waiting for first cpg event
1303134861 0:default confchg left 0 joined 1 total 2
1303134861 0:default process_node_join 3
1303134861 0:default cpg add node 2 total 1
1303134861 0:default cpg add node 3 total 2
1303134861 0:default make_event_id 300020001 nodeid 3 memb_count 2 type 1
1303134861 0:default queue join event for nodeid 3
1303134861 0:default process_current_event 300020001 3 JOIN_BEGIN
1303134861 0:default app node init: add 3 total 1
1303134861 0:default app node init: add 2 total 2
1303134861 0:default waiting for 1 more stopped messages before
JOIN_ALL_STOPPED
That looks like a service error. Is fencing started and working? Check
the output of cman_tool services or group_tool
Chrissie
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster