Re: can't communicate with fenced -1

"Gian Paolo Buono" <gpbuono@xxxxxxxxx> · Wed, 25 Jun 2008 11:24:04 +0200

Hi,
an other problem the process clurgmgrd don't dead:

[root@yoda2 ~]# /etc/init.d/rgmanager stop
Shutting down Cluster Service Manager...
Waiting for services to stop:    

but nothing to do...

[root@yoda2 ~]# ps -ef | grep clurgmgrd
root      6620     1 55 Jun03 ?        12-02:06:46 clurgmgrd
[root@yoda2 ~]# kill -9 6620    

[root@yoda2 ~]# ps -ef | grep clurgmgrd

and the process clvmd

[root@yoda2 ~]# /etc/init.d/clvmd status
clvmd dead but subsys locked
active volumes: LV06 LV_nex2 

help me ... i don't want reboot the yoda2 ...

bye 

On Wed, Jun 25, 2008 at 10:55 AM, Gian Paolo Buono <gpbuono@xxxxxxxxx> wrote:

Hi,
if I try to restart on yoda2 cman 
[root@yoda2 ~]# /etc/init.d/cman restart
Stopping cluster:
   Stopping fencing... done
   Stopping cman... done
   Stopping ccsd... done
   Unmounting configfs... done

                                                           [  OK  ]
Starting cluster:
   Enabling workaround for Xend bridged networking... done
   Loading modules... done
   Mounting configfs... done

   Starting ccsd... done

   Starting cman... done
   Starting daemons... done
   Starting fencing... failed

                                                           [FAILED]
[root@yoda2 ~]# tail -f /var/log/messages
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] Members Joined:

Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ]   r(0) ip(172.20.0.174)
Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within the primary component and will provide service.

Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL state.
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message 172.20.0.174
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message 172.20.0.175

Jun 25 10:50:42 yoda2 openais[18429]: [CPG  ] got joinlist message from node 2
Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application
Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate

Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111
Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster infrastructure after 30 seconds.
Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP: [172.20.0.32]:55090

on this server there are 3 xen domu and i can't to reboot yoda2 :( ..

best regards..  and sorry for my english :) 

2008/6/25 GS R <gsrlinux@xxxxxxxxx>:

2008/6/25 GS R <gsrlinux@xxxxxxxxx>:

On 6/24/08, Gian Paolo Buono <gpbuono@xxxxxxxxx> wrote:

Hi,

We have two RHEL5.1 boxes installed sharing a
single iscsi emc2 SAN, whitout fence devices. System is configured

as a high-availability system of xen guest. 

One of the most repeating problems are fence_tool related.

  # service cman start
  Starting cluster:
     Loading modules... done
     Mounting configfs... done
     Starting ccsd... done
     Starting cman... done
     Starting daemons... done
 Starting fencing... fence_tool: can't communicate with fenced -1

 # fenced -D
  1204556546 cman_init error 0 111

  # clustat
  CMAN is not running.

  # cman_tool join

  # clustat
  msg_open: Connection refused

  Member Status: Quorate
    Member Name                        ID   Status

    ------ ----                        ---- ------
    yoda1                             1 Online, Local
    yoda2                             2 Offline

Sometimes this problem gets solved if the two machines are rebooted at

the same time. But in the current HA configuration, I cannot guarantee
two systems will be rebooted at the same time for every problem we
face. This is my config file:

###################################cluster.conf####################################

<?xml version="1.0"?>
<cluster alias="yoda-cl" config_version="2" name="yoda-cl">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>

        <clusternodes>
                <clusternode name="yoda2" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="yoda1" nodeid="2" votes="1">

                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <rm>
                <failoverdomains/>

                <resources/>
        </rm>
        <fencedevices/>
</cluster>
###################################cluster.conf####################################

Regards.
Hi

I configured a two node cluster with no fence device on RHEL5.1.
The cluster started and stopped with no issues. The only difference that I see is that I have used FQDN in my cluster.conf

i.e., <clusternode name="yoda2.gsr.com" nodeid="1" votes="1">

Check your /etc/hosts if it has the FQDN in it.

Thanks
Gowrishankar Rajaiyan

On 6/25/08, Gian Paolo Buono <gpbuono@xxxxxxxxx> wrote: 
Hi,
the problem of my cluster is that it start-up weel but after two days the problem that I have described is running, and this problem gets solved if the two machines are rebooted at the same time.

Thanks
Gian Paolo

Hi Gian

Could you please attach the logs.

Thanks
Gowrishankar Rajaiyan

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster