Re: [Linux-cluster] having problems trying to setup a two node cluster

Rick Stevens <rstevens@xxxxxxxxxxxxxxx> · Wed, 01 Dec 2004 10:39:49 -0800

vahram wrote:
Rick Stevens wrote:

I had a similar issue.  The problem was with the multicast routing.
I was using two NICs on each node...one public (eth0) and one private
(eth1), with the default gateway going out eth0.

The route for the multicast (224.x.x.x) was going out the default
gateway and not reaching the other machine.  By putting in a fixed route
in for multicast:

    route add -net 224.0.0.0/8 dev eth1

it all started working.  This was my fix, it may not work for you.
Also, I use the CVS code from http://sources.redhat.com/cluster and
not the source RPMs from where you specified.
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens@xxxxxxxxxxxxxxx -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-     Veni, Vidi, VISA:  I came, I saw, I did a little shopping.     -
----------------------------------------------------------------------

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster

Yeap, both boxes have two NICs.  eth0 is public, and eth1 is private 
(192.168.2.x).  I tried adding the route, and that didn't fix it.  I've 
also tried disabling the private NIC before and running with one public 
NIC, and that didn't fix it either.  One other interesting thing I 
noticed...when I run cman_tool join on nodeA, netstat shows ccsd trying 
to do this:

tcp        0      0 127.0.0.1:50006             127.0.0.1:739     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:738     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:737     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:736     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:743     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:742     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:741     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:740     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:727     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:731     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:730     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:729     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:728     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:735     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:734     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:733     
TIME_WAIT   -

tcp        0      0 127.0.0.1:50006             127.0.0.1:732     
TIME_WAIT   -

Looking back at your cluster.conf, I see you're using broadcast.  I used
multicast because, in the first CVS checkout I did, broadcast didn't
work properly.  It's possible your SRPMs also have that flaw.  Why not
try multicast and see if that works.  Add that route I mentioned and
here's my cluster.conf which you can crib:

<?xml version="1.0"?>
<cluster name="test" config_version="1">

    <cman two-node="1" expected_votes="1">
        <multicast addr="224.0.0.1"/>
    </cman>

    <nodes>
        <node name="gfs-01-001" votes="1">
            <multicast addr="224.0.0.1" interface="eth1"/>
            <fence>
                <method name="single">
                    <device name="human" ipaddr="gfs-01-001"/>
                </method>
            </fence>
        </node>

        <node name="gfs-01-002" votes="1">
            <multicast addr="224.0.0.1" interface="eth1"/>
            <fence>
                <method name="single">
                    <device name="human" ipaddr="gfs-01-002"/>
                </method>
            </fence>
        </node>
    </nodes>

    <fence_devices>
        <device name="human" agent="fence_manual"/>
    </fence_devices>
</cluster>

----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens@xxxxxxxxxxxxxxx -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-  What's small, yellow and very, VERY dangerous?  The root canary!  -
----------------------------------------------------------------------