Re: [Linux-cluster] having problems trying to setup a two node cluster

Brynnen R Owen <owen@xxxxxxxxxxxxx> · Wed, 1 Dec 2004 12:49:20 -0600

  One possibility may be that the hostnames used in your cluster.conf
file resolve to 127.0.0.1 in /etc/hosts.  Then the system will try to
broadcast to the lo device.

  By the way, we're using CVS from Nov 21 with broadcast on
dual-nic's.

On Wed, Dec 01, 2004 at 10:39:49AM -0800, Rick Stevens wrote:
> vahram wrote:
> >Rick Stevens wrote:
> >
> >>
> >>I had a similar issue.  The problem was with the multicast routing.
> >>I was using two NICs on each node...one public (eth0) and one private
> >>(eth1), with the default gateway going out eth0.
> >>
> >>The route for the multicast (224.x.x.x) was going out the default
> >>gateway and not reaching the other machine.  By putting in a fixed route
> >>in for multicast:
> >>
> >>    route add -net 224.0.0.0/8 dev eth1
> >>
> >>it all started working.  This was my fix, it may not work for you.
> >>Also, I use the CVS code from http://sources.redhat.com/cluster and
> >>not the source RPMs from where you specified.
> >>----------------------------------------------------------------------
> >>- Rick Stevens, Senior Systems Engineer     rstevens@xxxxxxxxxxxxxxx -
> >>- VitalStream, Inc.                       http://www.vitalstream.com -
> >>-                                                                    -
> >>-     Veni, Vidi, VISA:  I came, I saw, I did a little shopping.     -
> >>----------------------------------------------------------------------
> >>
> >>-- 
> >>
> >>Linux-cluster@xxxxxxxxxx
> >>http://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >Yeap, both boxes have two NICs.  eth0 is public, and eth1 is private 
> >(192.168.2.x).  I tried adding the route, and that didn't fix it.  I've 
> >also tried disabling the private NIC before and running with one public 
> >NIC, and that didn't fix it either.  One other interesting thing I 
> >noticed...when I run cman_tool join on nodeA, netstat shows ccsd trying 
> >to do this:
> >
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:739     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:738     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:737     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:736     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:743     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:742     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:741     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:740     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:727     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:731     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:730     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:729     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:728     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:735     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:734     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:733     
> >TIME_WAIT   -
> >tcp        0      0 127.0.0.1:50006             127.0.0.1:732     
> >TIME_WAIT   -
> >
> 
> Looking back at your cluster.conf, I see you're using broadcast.  I used
> multicast because, in the first CVS checkout I did, broadcast didn't
> work properly.  It's possible your SRPMs also have that flaw.  Why not
> try multicast and see if that works.  Add that route I mentioned and
> here's my cluster.conf which you can crib:
> 
> <?xml version="1.0"?>
> <cluster name="test" config_version="1">
> 
> 
>     <cman two-node="1" expected_votes="1">
>         <multicast addr="224.0.0.1"/>
>     </cman>
> 
> 
>     <nodes>
>         <node name="gfs-01-001" votes="1">
>             <multicast addr="224.0.0.1" interface="eth1"/>
>             <fence>
>                 <method name="single">
>                     <device name="human" ipaddr="gfs-01-001"/>
>                 </method>
>             </fence>
>         </node>
> 
> 
>         <node name="gfs-01-002" votes="1">
>             <multicast addr="224.0.0.1" interface="eth1"/>
>             <fence>
>                 <method name="single">
>                     <device name="human" ipaddr="gfs-01-002"/>
>                 </method>
>             </fence>
>         </node>
>     </nodes>
> 
> 
>     <fence_devices>
>         <device name="human" agent="fence_manual"/>
>     </fence_devices>
> </cluster>
> 
> ----------------------------------------------------------------------
> - Rick Stevens, Senior Systems Engineer     rstevens@xxxxxxxxxxxxxxx -
> - VitalStream, Inc.                       http://www.vitalstream.com -
> -                                                                    -
> -  What's small, yellow and very, VERY dangerous?  The root canary!  -
> ----------------------------------------------------------------------
> 
> --
> 
> Linux-cluster@xxxxxxxxxx
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>  Brynnen Owen            (     this space for rent                      )<>
<>  owen@xxxxxxxx           (                                              )<>
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>