Hi
I'm tryint to get gfs over gnbd running on Debian/Sarge.
ccsd is running fine (Using either IPv4, or IPv6), but lock_gulmd
hangs when it's started. I have enabled IPv6 in my kernel, but didn't
configure any IPv6 addresses. There are, howevery, link-local IPv6
addresses configures for each interface (Linux seems to add them
automatically). I'm running lock_gulmd with the following options
"-n cluster-ws-sx --use_ccs --name master.ws-sx.cluster.solution-x.com
-v ReallyAll".
Any tip & ideas? Any debugging I could do to trace that down?
This is what it syslogs:
Jun 27 05:15:22 elrond ccsd[795]: Starting ccsd DEVEL.1119711496:
Jun 27 05:15:22 elrond ccsd[795]: Built: Jun 25 2005 16:59:43
Jun 27 05:15:22 elrond ccsd[795]: Copyright (C) Red Hat, Inc. 2004
All rights reserved.
Jun 27 05:15:22 elrond ccsd[795]: IP Protocol:: IPv6 only Multicast
(default):: SET
Jun 27 05:15:28 elrond ccsd[795]: cluster.conf (cluster name =
cluster-ws-sx, version = 1) found.
Jun 27 05:15:32 elrond lock_gulmd_main[814]: Forked lock_gulmd_core.
Jun 27 05:15:32 elrond lock_gulmd_core[826]: Starting lock_gulmd_core
DEVEL.1119711496. (built Jun 25 2005 17:00:28) Copyright (C) 2004 Red
Hat, Inc. All rights reser
ved.
Jun 27 05:15:32 elrond lock_gulmd_core[826]: I am running in Standard mode.
Jun 27 05:15:32 elrond lock_gulmd_core[826]: I am
(master.ws-sx.cluster.solution-x.com) with ip (::ffff:10.100.20.1)
Jun 27 05:15:32 elrond lock_gulmd_core[826]: This is cluster cluster-ws-sx
Jun 27 05:15:32 elrond lock_gulmd_core[826]: In state: Pending
Jun 27 05:15:32 elrond lock_gulmd_core[826]: In state: Master
Jun 27 05:15:32 elrond lock_gulmd_core[826]: I see no Masters, So I am
becoming the Master.
Jun 27 05:15:32 elrond lock_gulmd_core[826]: Sending Quorum update to
slave master.ws-sx.cluster.solution-x.com
Jun 27 05:15:32 elrond lock_gulmd_core[826]: Could not send quorum
update to slave master.ws-sx.cluster.solution-x.com
Jun 27 05:15:32 elrond lock_gulmd_core[826]: New generation of server
state. (1119842132653336)
Jun 27 05:15:32 elrond lock_gulmd_core[826]: Got heartbeat from
master.ws-sx.cluster.solution-x.com at 1119842132653434
(last:1119842132653434 max:0 avg:0)
Jun 27 05:15:33 elrond lock_gulmd_main[814]: Forked lock_gulmd_LT.
Jun 27 05:15:33 elrond lock_gulmd_LT[828]: Starting lock_gulmd_LT
DEVEL.1119711496. (built Jun 25 2005 17:00:28) Copyright (C) 2004 Red
Hat, Inc. All rights reserved.
Jun 27 05:15:33 elrond lock_gulmd_LT[828]: I am running in Standard mode.
Jun 27 05:15:33 elrond lock_gulmd_LT[828]: I am
(master.ws-sx.cluster.solution-x.com) with ip (::ffff:10.100.20.1)
Jun 27 05:15:33 elrond lock_gulmd_LT[828]: This is cluster cluster-ws-sx
Jun 27 05:15:33 elrond lock_gulmd_LT000[828]: Locktable 0 started.
Jun 27 05:15:34 elrond lock_gulmd_main[814]: Forked lock_gulmd_LTPX.
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: Starting lock_gulmd_LTPX
DEVEL.1119711496. (built Jun 25 2005 17:00:28) Copyright (C) 2004 Red
Hat, Inc. All rights reser
ved.
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: I am running in Standard mode.
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: I am
(master.ws-sx.cluster.solution-x.com) with ip (::ffff:10.100.20.1)
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: This is cluster cluster-ws-sx
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: ltpx started.
ps auxwww | grep gulm gives:
root 826 0.0 0.1 2008 840 ? S<s 05:15 0:00
lock_gulmd_core --cluster_name cluster-ws-sx --servers
::ffff:10.100.20.1 --name master.ws-sx.cluster.solution-x.com
--verbosity ReallyAll
root 828 0.0 0.1 2008 820 ? S<s 05:15 0:00
lock_gulmd_LT --cluster_name cluster-ws-sx --servers ::ffff:10.100.20.1
--name master.ws-sx.cluster.solution-x.com --verbosity ReallyAll
root 831 0.0 0.1 2008 820 ? S<s 05:15 0:00
lock_gulmd_LTPX --cluster_name cluster-ws-sx --servers
::ffff:10.100.20.1 --name master.ws-sx.cluster.solution-x.com
--verbosity ReallyAll
And finally, strace shows all 3 pids stuck in a recv call on fd 6.
Here is my cluster.conf:
<cluster name="cluster-ws-sx" config_version="1">
<gulm>
<lockserver name="master.ws-sx.cluster.solution-x.com"/>
</gulm>
<clusternodes>
<clusternode name="master.ws-sx.cluster.solution-x.com">
<method name="single">
<device name="gnbd"
nodename="master.ws-sx.cluster.solution-x.com"/>
</method>
</clusternode>
<clusternode name="s1.ws-sx.cluster.solution-x.com">
<method name="single">
<device name="gnbd"
nodename="s1.ws-sx.cluster.solution-x.com"/>
</method>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="gnbd" agent="fence_gnbd"
servers="10.100.20.1"/>
</fencedevices>
</cluster>
greetings, Florian Pflug
--
Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster