On Wed, Jul 23, 2008 at 06:56:40PM -0300, Tiago Cruz wrote: > Hello, > > I have one machine (hotsite-bsb-la-1) exporting GNBD to two machines (hotsite-bsb-la-2 and "-3") > > The cluster with RHEL 5.2 x86_64 and GFS was working very well, util I reboot the hotsite-bsb-la-2: > > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] CLM CONFIGURATION CHANGE > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] New Configuration: > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.30) > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.33) > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] Members Left: > Jul 23 18:56:38 hotsite-bsb-la-1 kernel: dlm: closing connection to node 2 > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.31) > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] Members Joined: > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] CLM CONFIGURATION CHANGE > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] New Configuration: > Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: hotsite-bsb-la-2.com not a cluster member after 0 sec post_fail_delay > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.30) > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.33) > Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: fencing node "hotsite-bsb-la-2.com" > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] Members Left: > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] Members Joined: > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [SYNC ] This node is within the primary component and will provide service. > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [TOTEM] entering OPERATIONAL state. > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] got nodejoin message 10.65.13.30 > Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: fence "hotsite-bsb-la-2.com" failed > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] got nodejoin message 10.65.13.33 > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CPG ] got joinlist message from node 1 > Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CPG ] got joinlist message from node 3 > Jul 23 18:56:43 hotsite-bsb-la-1 fenced[3099]: fencing node "hotsite-bsb-la-2.com.br" > Jul 23 18:56:43 hotsite-bsb-la-1 fenced[3099]: fence "hotsite-bsb-la-2.com.br" failed > Jul 23 19:00:57 hotsite-bsb-la-1 last message repeated 50 times > > Why fence was failing? Follow the cluster.conf: > > <?xml version="1.0"?> > <cluster alias="hotsites" config_version="18" name="hotsites"> > <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> > <clusternodes> > <clusternode name="hotsite-bsb-la-1.com" nodeid="1" votes="1"> > <fence/> > </clusternode> > <clusternode name="hotsite-bsb-la-2.com" nodeid="2" votes="1"> > <fence> > <method name="single"> > <device name="gnbd" nodename="hotsite-bsb-la-2.com"/> > </method> > </fence> > </clusternode> > <clusternode name="hotsite-bsb-la-3.com" nodeid="3" votes="1"> > <fence> > <method name="single"> > <device name="gnbd" nodename="hotsite-bsb-la-3.com"/> > </method> > </fence> > </clusternode> > </clusternodes> > <cman/> > <fencedevices> > <fencedevice agent="fence_gnbd" name="hotsite" servers="hotsite-1.com"/> > </fencedevices> > <rm> > <failoverdomains/> > <resources> > <clusterfs device="/dev/gnbd/hotsite" force_unmount="1" fsid="5666" fstype="gfs" mountpoint="/data" name="data" self_fence="1"/> > </resources> > </rm> > <totem consensus="4800" join="60" token="10000" token_retransmits_before_loss_const="20"/> > </cluster> > There are two problem's with your cluster.conf file that may be causing this. 1. In the clusternode <device> line for fencing devices, "name" must be the same as "name" in the appropriate <fencedevice> line. 2. In the <fencedevice> line, the "servers" must be listed using the "name" in <clusternode> line. So, for your configuration, the <fencedevice> line should be <fencedevice agent="fence_gnbd" name="gnbd" servers="hotsite-bsb-la-1.com"/> See if this helps. -Ben > > > # cman_tool status > Version: 6.1.0 > Config Version: 18 > Cluster Name: hotsites > Cluster Id: 27589 > Cluster Member: Yes > Cluster Generation: 184 > Membership state: Cluster-Member > Nodes: 2 > Expected votes: 3 > Total votes: 2 > Quorum: 2 > Active subsystems: 8 > Flags: Dirty > Ports Bound: 0 177 > Node name: hotsite-bsb-la-1.com > Node ID: 1 > Multicast addresses: 239.192.107.49 > Node addresses: 10.65.13.30 > > > Thanks > > -- > Tiago Cruz > http://everlinux.com > Linux User #282636 > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster