On Tue, Apr 19, 2011 at 4:59 AM, Christine Caulfield <ccaulfie@xxxxxxxxxx> wrote: > On 18/04/11 15:49, Terry wrote: >> >> On Mon, Apr 18, 2011 at 9:26 AM, Christine Caulfield >> <ccaulfie@xxxxxxxxxx> wrote: >>> >>> On 18/04/11 15:11, Terry wrote: >>>> >>>> On Mon, Apr 18, 2011 at 8:57 AM, Christine Caulfield >>>> <ccaulfie@xxxxxxxxxx> wrote: >>>>> >>>>> On 18/04/11 14:38, Terry wrote: >>>>>> >>>>>> On Mon, Apr 18, 2011 at 3:48 AM, Christine Caulfield >>>>>> <ccaulfie@xxxxxxxxxx> wrote: >>>>>>> >>>>>>> On 17/04/11 21:52, Terry wrote: >>>>>>>> >>>>>>>> As a result of a strange situation where our licensing for storage >>>>>>>> dropped off, I need to join a centos 5.6 node to a now single node >>>>>>>> cluster. I got it joined to the cluster but I am having issues with >>>>>>>> CLVMD. Any lvm operations on both boxes hang. For example, vgscan. >>>>>>>> I have increased debugging and I don't see any logs. The VGs aren't >>>>>>>> being populated in /dev/mapper. This WAS working right after I >>>>>>>> joined >>>>>>>> it to the cluster and now it's not for some unknown reason. Not >>>>>>>> sure >>>>>>>> where to take this at this point. I did find one weird startup log >>>>>>>> that I am not sure what it means yet: >>>>>>>> [root@omadvnfs01a ~]# dmesg | grep dlm >>>>>>>> dlm: no local IP address has been set >>>>>>>> dlm: cannot start dlm lowcomms -107 >>>>>>>> dlm: Using TCP for communications >>>>>>>> dlm: connecting to 2 >>>>>>>> >>>>>>> >>>>>>> >>>>>>> That message usually means that dlm_controld has failed to start. Try >>>>>>> starting the cman daemons (groupd, dlm_controld) manually with the -D >>>>>>> switch >>>>>>> and read the output which might give some clues to why it's not >>>>>>> working. >>>>>>> >>>>>>> Chrissie >>>>>>> >>>>>> >>>>>> >>>>>> Hi Chrissie, >>>>>> >>>>>> I thought of that but I see dlm started on both nodes. See right >>>>>> below. >>>>>> >>>>>>>> [root@omadvnfs01a ~]# ps xauwwww | grep dlm >>>>>>>> root 5476 0.0 0.0 24736 760 ? Ss 15:34 0:00 >>>>>>>> /sbin/dlm_controld >>>>>>>> root 5502 0.0 0.0 0 0 ? S< 15:34 >>>>>>>> 0:00 >>>>> >>>>> >>>>> Well, that's encouraging in a way! But it's evidently not started fully >>>>> or >>>>> the DLM itself would be working. So I still recommend starting it with >>>>> -D >>>>> to >>>>> see how far it gets. >>>>> >>>>> >>>>> Chrissie >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster@xxxxxxxxxx >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>> >>>> I think we had posts cross. Here's my latest: >>>> >>>> Ok, started all the CMAN elements manually as you suggested. I >>>> started them in order as in the init script. Here's the only error >>>> that I see. I can post the other debug messages if you think they'd >>>> be useful but this is the only one that stuck out to me. >>>> >>>> [root@omadvnfs01a ~]# /sbin/dlm_controld -D >>>> 1303134840 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 >>>> 1303134840 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 >>>> 1303134840 set_ccs_options 480 >>>> 1303134840 cman: node 2 added >>>> 1303134840 set_configfs_node 2 10.198.1.111 local 0 >>>> 1303134840 cman: node 3 added >>>> 1303134840 set_configfs_node 3 10.198.1.110 local 1 >>>> >>> >>> Can I see the whole set please ? It looks like dlm_controld might be >>> stalled >>> registering with groupd. >>> >>> Chrissie >>> >>> -- >> >> Here you go. Thank you very much for the help. Each daemon's output >> that I started is below. >> >> [root@omadvnfs01a log]# /sbin/ccsd -n >> Starting ccsd 2.0.115: >> Built: Mar 6 2011 00:47:03 >> Copyright (C) Red Hat, Inc. 2004 All rights reserved. >> No Daemon:: SET >> >> cluster.conf (cluster name = omadvnfs01, version = 71) found. >> Remote copy of cluster.conf is from quorate node. >> Local version # : 71 >> Remote version #: 71 >> Remote copy of cluster.conf is from quorate node. >> Local version # : 71 >> Remote version #: 71 >> Remote copy of cluster.conf is from quorate node. >> Local version # : 71 >> Remote version #: 71 >> Remote copy of cluster.conf is from quorate node. >> Local version # : 71 >> Remote version #: 71 >> Initial status:: Quorate >> >> [root@omadvnfs01a ~]# /sbin/fenced -D >> 1303134822 cman: node 2 added >> 1303134822 cman: node 3 added >> 1303134822 our_nodeid 3 our_name omadvnfs01a.sec.jel.lc >> 1303134822 listen 4 member 5 groupd 7 >> 1303134861 client 3: join default >> 1303134861 delay post_join 3s post_fail 0s >> 1303134861 added 2 nodes from ccs >> 1303134861 setid default 65537 >> 1303134861 start default 1 members 2 3 >> 1303134861 do_recovery stop 0 start 1 finish 0 >> 1303134861 finish default 1 >> >> [root@omadvnfs01a ~]# /sbin/dlm_controld -D >> 1303134840 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 >> 1303134840 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 >> 1303134840 set_ccs_options 480 >> 1303134840 cman: node 2 added >> 1303134840 set_configfs_node 2 10.198.1.111 local 0 >> 1303134840 cman: node 3 added >> 1303134840 set_configfs_node 3 10.198.1.110 local 1 >> >> >> [root@omadvnfs01a ~]# /sbin/groupd -D >> 1303134809 cman: our nodeid 3 name omadvnfs01a.sec.jel.lc quorum 1 >> 1303134809 setup_cpg groupd_handle 6b8b456700000000 >> 1303134809 groupd confchg total 2 left 0 joined 1 >> 1303134809 send_version nodeid 3 cluster 2 mode 2 compat 1 >> 1303134822 client connection 3 >> 1303134822 got client 3 setup >> 1303134822 setup fence 0 >> 1303134840 client connection 4 >> 1303134840 got client 4 setup >> 1303134840 setup dlm 1 >> 1303134853 client connection 5 >> 1303134853 got client 5 setup >> 1303134853 setup gfs 2 >> 1303134861 got client 3 join >> 1303134861 0:default got join >> 1303134861 0:default is cpg client 6 name 0_default handle >> 6633487300000001 >> 1303134861 0:default cpg_join ok >> 1303134861 0:default waiting for first cpg event >> 1303134861 client connection 7 >> 1303134861 0:default waiting for first cpg event >> 1303134861 got client 7 get_group >> 1303134861 0:default waiting for first cpg event >> 1303134861 0:default waiting for first cpg event >> 1303134861 0:default confchg left 0 joined 1 total 2 >> 1303134861 0:default process_node_join 3 >> 1303134861 0:default cpg add node 2 total 1 >> 1303134861 0:default cpg add node 3 total 2 >> 1303134861 0:default make_event_id 300020001 nodeid 3 memb_count 2 type 1 >> 1303134861 0:default queue join event for nodeid 3 >> 1303134861 0:default process_current_event 300020001 3 JOIN_BEGIN >> 1303134861 0:default app node init: add 3 total 1 >> 1303134861 0:default app node init: add 2 total 2 >> 1303134861 0:default waiting for 1 more stopped messages before >> JOIN_ALL_STOPPED >> > > That looks like a service error. Is fencing started and working? Check the > output of cman_tool services or group_tool > > Chrissie > > -- Another point that I saw is the output of clustat looks good on the centos node, but the centos node appears offline to the rhel node. Here's that clustat as well as group_tool and cman_tool from both nodes: centos: [root@omadvnfs01a ~]# clustat Cluster Status for omadvnfs01 @ Mon Apr 18 18:25:58 2011 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ omadvnfs01b.sec.jel.lc 2 Online, rgmanager omadvnfs01a.sec.jel.lc 3 Online, Local, rgmanager ... [root@omadvnfs01a ~]# group_tool -v ls type level name id state node id local_done fence 0 default 00010001 none [2 3] dlm 1 clvmd 00040002 none [2 3] dlm 1 rgmanager 00030002 none [2 3] [root@omadvnfs01a ~]# cman_tool status Version: 6.2.0 Config Version: 72 Cluster Name: omadvnfs01 Cluster Id: 44973 Cluster Member: Yes Cluster Generation: 1976 Membership state: Cluster-Member Nodes: 2 Expected votes: 1 Total votes: 2 Quorum: 1 Active subsystems: 9 Flags: 2node Dirty Ports Bound: 0 11 177 Node name: omadvnfs01a.sec.jel.lc Node ID: 3 Multicast addresses: 239.192.175.93 Node addresses: 10.198.1.110 rhel: [root@omadvnfs01b ~]# clustat Cluster Status for omadvnfs01 @ Tue Apr 19 08:29:07 2011 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ omadvnfs01b.sec.jel.lc 2 Online, Local, rgmanager omadvnfs01a.sec.jel.lc 3 Offline, rgmanager ... [root@omadvnfs01b ~]# group_tool -v ls type level name id state node id local_done fence 0 default 00010001 none [2 3] dlm 1 gfs_data00 00020002 none [2] dlm 1 rgmanager 00030002 none [2 3] dlm 1 clvmd 00040002 none [2 3] gfs 2 gfs_data00 00010002 none [2] [root@omadvnfs01b ~]# cman_tool status Version: 6.2.0 Config Version: 72 Cluster Name: omadvnfs01 Cluster Id: 44973 Cluster Member: Yes Cluster Generation: 1976 Membership state: Cluster-Member Nodes: 2 Expected votes: 1 Total votes: 2 Quorum: 1 Active subsystems: 9 Flags: 2node Dirty Ports Bound: 0 11 177 Node name: omadvnfs01b.sec.jel.lc Node ID: 2 Multicast addresses: 239.192.175.93 Node addresses: 10.198.1.111 Thanks! -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster