Re: cman & qdiskd

Josef Whiter <jwhiter@xxxxxxxxxx> · Tue, 17 Oct 2006 14:49:34 -0400

Hmm well thats odd.  Stop qdiskd on both nodes and then start it on one and
watch /var/log/messages and see what it spits out.  If it doesn't spit out any
obvious errors do the same on the next node.  If it still doesnt work copy the
relevant parts of the logs into something like http://pastebin.com and post the
url so i can take a look.

Josef

On Tue, Oct 17, 2006 at 08:42:59PM +0200, Katriel Traum wrote:
> qdisk is running:
> [root@n1 ~]# service qdiskd status
> qdiskd (pid 1199) is running...
> [root@n2 ~]# service qdiskd status
> qdiskd (pid 873) is running...
> 
> /tmp/qdisk-status:
> [root@n1 ~]# cat /tmp/qdisk-status
> Node ID: 1
> Score (current / min req. / max allowed): 3 / 2 / 3
> Current state: Master
> Current disk state: None
> Visible Set: { 1 2 }
> Master Node ID: 1
> Quorate Set: { 1 2 }
> 
> 
> Both nodes see /dev/etherd/e0.0 and can access it (tcpdump shows both
> accessing it for timestamps I suppose)
> /proc/cluster/nodes shows the same as "cman_tool nodes":
> [root@n1 ~]# cman_tool nodes
> Node  Votes Exp Sts  Name
>    1    1    2   M   n1
>    2    1    2   M   n2
> 
> Everything looks OK, it's just not working.
> 
> cluster.conf:
> <?xml version="1.0"?>
> <cluster config_version="9" name="alpha_cluster">
>         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="n1" votes="1" nodeid="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="man_fence"
> nodename="n1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="n2" votes="1" nodeid="2">
>                         <fence>
>                                 <method name="1">
>                                         <device name="man_fence"
> nodename="n2"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman/>
>         <fencedevices>
>                 <fencedevice agent="fence_manual" name="man_fence"/>
>         </fencedevices>
>         <rm log_level="7">
>                 <failoverdomains/>
>                 <resources>
>                         <ip address="192.168.22.250" monitor_link="1"/>
>                         <script file="/etc/init.d/httpd" name="httpd"/>
>                 </resources>
>                 <service autostart="1" name="apache" recovery="relocate">
>                         <ip ref="192.168.22.250"/>
>                         <script ref="httpd"/>
>                 </service>
>         </rm>
>         <quorumd interval="1" tko="5" votes="3" log_level="7"
> device="/dev/etherd/e0.0" status_file="/tmp/qdisk-status">
>                 <heuristic program="ping 192.168.22.1 -c1 -t1" score="1"
> interval="2"/>
>                 <heuristic program="ping 192.168.22.60 -c1 -t1"
> score="1" interval="2"/>
>                 <heuristic program="ping 192.168.22.100 -c1 -t1"
> score="1" interval="2"/>
>         </quorumd>
> </cluster>
> 
> Katriel
> 
> Josef Whiter wrote:
> > What does you're cluster.conf look like?  What about /proc/cluster/nodes?  Are
> > you sure qdiskd is starting?  Your quorum stuff looks fine.  Do both nodes see
> > /dev/etherd/e0.0 as the same disk?
> > 
> > Josef
> > 
> > On Tue, Oct 17, 2006 at 08:11:42PM +0200, Katriel Traum wrote:
> > Hello.
> > 
> > I've seen this subject on the list, but no real solutions.
> > I'm using Cluster 4 update 4, with qdiskd and a shared disk.
> > I've understood from the documentation and list that a "cman_tool
> > status" should reflect the number of votes the quorum daemon holds.
> > 
> > My setup is pretty straight forward, 2-node cluster, shared storage (AoE
> > for testing).
> > qdiskd configuration:
> > <quorumd interval="1" tko="5" votes="3" log_level="7" device="/dev/ether
> > d/e0.0" status_file="/tmp/qdisk-status">
> >                 <heuristic program="ping 192.168.22.1 -c1 -t1" score="1"
> > interval="2"/>
> >                 <heuristic program="ping 192.168.22.60 -c1 -t1"
> > score="1" interval="2"/>
> >                 <heuristic program="ping 192.168.22.100 -c1 -t1"
> > score="1" interval="2"/>
> >         </quorumd>
> > 
> > cman_tool status shows:
> > [root@n1 ~]# cman_tool status
> > Protocol version: 5.0.1
> > Config version: 8
> > Cluster name: alpha_cluster
> > Cluster ID: 50356
> > Cluster Member: Yes
> > Membership state: Cluster-Member
> > Nodes: 2
> > Expected_votes: 2
> > Total_votes: 2
> > Quorum: 2
> > Active subsystems: 4
> > Node name: n1
> > Node addresses: 192.168.22.201
> > 
> > qdiskd is running, scoring a perfect 3 out of 3, but no votes...
> > When disconnecting one of the nodes, the other will loose quorum. Am I
> > missing something?
> > 
> > Any insight appreciated.
> >>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> > --
> > Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> -- 
> Katriel Traum, PenguinIT
> RHCE, CLP
> Mobile: 054-6789953
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster