lock_gulmd hanging on startup (STABLE, as of 24th running on Debian/Sarge)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I'm tryint to get gfs over gnbd running on Debian/Sarge.
ccsd is running fine (Using either IPv4, or IPv6), but lock_gulmd
hangs when it's started. I have enabled IPv6 in my kernel, but didn't
configure any IPv6 addresses. There are, howevery, link-local IPv6 addresses configures for each interface (Linux seems to add them automatically). I'm running lock_gulmd with the following options "-n cluster-ws-sx --use_ccs --name master.ws-sx.cluster.solution-x.com -v ReallyAll".

Any tip & ideas? Any debugging I could do to trace that down?

This is what it syslogs:
Jun 27 05:15:22 elrond ccsd[795]: Starting ccsd DEVEL.1119711496:
Jun 27 05:15:22 elrond ccsd[795]:  Built: Jun 25 2005 16:59:43
Jun 27 05:15:22 elrond ccsd[795]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Jun 27 05:15:22 elrond ccsd[795]: IP Protocol:: IPv6 only Multicast (default):: SET Jun 27 05:15:28 elrond ccsd[795]: cluster.conf (cluster name = cluster-ws-sx, version = 1) found.
Jun 27 05:15:32 elrond lock_gulmd_main[814]: Forked lock_gulmd_core.
Jun 27 05:15:32 elrond lock_gulmd_core[826]: Starting lock_gulmd_core DEVEL.1119711496. (built Jun 25 2005 17:00:28) Copyright (C) 2004 Red Hat, Inc. All rights reser
ved.
Jun 27 05:15:32 elrond lock_gulmd_core[826]: I am running in Standard mode.
Jun 27 05:15:32 elrond lock_gulmd_core[826]: I am (master.ws-sx.cluster.solution-x.com) with ip (::ffff:10.100.20.1)
Jun 27 05:15:32 elrond lock_gulmd_core[826]: This is cluster cluster-ws-sx
Jun 27 05:15:32 elrond lock_gulmd_core[826]: In state: Pending
Jun 27 05:15:32 elrond lock_gulmd_core[826]: In state: Master
Jun 27 05:15:32 elrond lock_gulmd_core[826]: I see no Masters, So I am becoming the Master. Jun 27 05:15:32 elrond lock_gulmd_core[826]: Sending Quorum update to slave master.ws-sx.cluster.solution-x.com Jun 27 05:15:32 elrond lock_gulmd_core[826]: Could not send quorum update to slave master.ws-sx.cluster.solution-x.com Jun 27 05:15:32 elrond lock_gulmd_core[826]: New generation of server state. (1119842132653336) Jun 27 05:15:32 elrond lock_gulmd_core[826]: Got heartbeat from master.ws-sx.cluster.solution-x.com at 1119842132653434 (last:1119842132653434 max:0 avg:0)
Jun 27 05:15:33 elrond lock_gulmd_main[814]: Forked lock_gulmd_LT.
Jun 27 05:15:33 elrond lock_gulmd_LT[828]: Starting lock_gulmd_LT DEVEL.1119711496. (built Jun 25 2005 17:00:28) Copyright (C) 2004 Red Hat, Inc. All rights reserved.

Jun 27 05:15:33 elrond lock_gulmd_LT[828]: I am running in Standard mode.
Jun 27 05:15:33 elrond lock_gulmd_LT[828]: I am (master.ws-sx.cluster.solution-x.com) with ip (::ffff:10.100.20.1)
Jun 27 05:15:33 elrond lock_gulmd_LT[828]: This is cluster cluster-ws-sx
Jun 27 05:15:33 elrond lock_gulmd_LT000[828]: Locktable 0 started.
Jun 27 05:15:34 elrond lock_gulmd_main[814]: Forked lock_gulmd_LTPX.
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: Starting lock_gulmd_LTPX DEVEL.1119711496. (built Jun 25 2005 17:00:28) Copyright (C) 2004 Red Hat, Inc. All rights reser
ved.
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: I am running in Standard mode.
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: I am (master.ws-sx.cluster.solution-x.com) with ip (::ffff:10.100.20.1)
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: This is cluster cluster-ws-sx
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: ltpx started.

ps auxwww | grep gulm gives:
root 826 0.0 0.1 2008 840 ? S<s 05:15 0:00 lock_gulmd_core --cluster_name cluster-ws-sx --servers ::ffff:10.100.20.1 --name master.ws-sx.cluster.solution-x.com --verbosity ReallyAll root 828 0.0 0.1 2008 820 ? S<s 05:15 0:00 lock_gulmd_LT --cluster_name cluster-ws-sx --servers ::ffff:10.100.20.1 --name master.ws-sx.cluster.solution-x.com --verbosity ReallyAll root 831 0.0 0.1 2008 820 ? S<s 05:15 0:00 lock_gulmd_LTPX --cluster_name cluster-ws-sx --servers ::ffff:10.100.20.1 --name master.ws-sx.cluster.solution-x.com --verbosity ReallyAll

And finally, strace shows all 3 pids stuck in a recv call on fd 6.

Here is my cluster.conf:
<cluster name="cluster-ws-sx" config_version="1">
        <gulm>
                <lockserver name="master.ws-sx.cluster.solution-x.com"/>
        </gulm>
        <clusternodes>
                <clusternode name="master.ws-sx.cluster.solution-x.com">
                        <method name="single">
<device name="gnbd" nodename="master.ws-sx.cluster.solution-x.com"/>
                        </method>
                </clusternode>

                <clusternode name="s1.ws-sx.cluster.solution-x.com">
                        <method name="single">
<device name="gnbd" nodename="s1.ws-sx.cluster.solution-x.com"/>
                        </method>
                </clusternode>
        </clusternodes>

        <fencedevices>
<fencedevice name="gnbd" agent="fence_gnbd" servers="10.100.20.1"/>
        </fencedevices>
</cluster>

greetings, Florian Pflug

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux