David Teigland wrote:
On Thu, Jun 16, 2005 at 03:57:14PM +0200, Ion Alberdi wrote:
I tried to update my cluster (kernel space and userspace) to the latest
cvs version with the linux 2.6.11.12 kernel
Use this: http://people.redhat.com/teigland/cluster-2.6.11.tar.bz2
The cvs head isn't ready for general use yet, and the RHEL4/FC4 cvs
branches don't work with 2.6.11.
using the patch method and I experiences some problems.
We don't keep the patches updated, so that method doesn't work any longer.
Build everything (including kernel modules) within the cluster directory.
Dave
OK thank you !!!!
I installed this version and the cluster can ben launched now.
But I have the same problem with the rgmanager that I have don't managed
to debug:
I launch the cluster on the two nodes (buba and gump) (ccsd,cman,fence)
I launch the rgmanager on the two nodes.
I activate/desactivate the simplest service (#!/bin/sh
exit 0)
on gump, and it works
whereas when I try to do the same in buba I have te following error:
Jun 17 17:00:18 buba clurgmgrd[24643]: <notice> Starting disabled
service datamover
Jun 17 17:00:18 buba clurgmgrd[24643]: <warning> #68: Failed to start
datamover; return value: 1
Jun 17 17:00:18 buba clurgmgrd[24643]: <notice> Stopping service datamover
Jun 17 17:00:18 buba clurgmgrd[24643]: <crit> #12: RG datamover failed
to stop; intervention required
Jun 17 17:00:18 buba clurgmgrd[24643]: <notice> Service datamover is failed
When I look to rgmanager/errors.txt
#68: Failed to start <name>; return value: <integer>
The resource group <name> failed to start and returned the value <integer>.
This could indicate missing resources on the node or an improperly
configured
resource group. Check your resource group's configuration against your
hardware and software configuration and ensure that it is correct.
What I don't undertand is that buba and gump have the same cluster
components, and the same install
so it's weird that it works on gump, and not on buba.
Another thing that is really weird: here is the script launched
by the rgmanager:
#!/bin/sh
exit 0
So it can never return 1, which is contradictory with the message #68:
Failed to start datamover; return value: 1.
Here is my cluster.conf:
<?xml version="1.0"?>
<cluster name="cluster1" config_version="1">
<cman two_node="1" expected_votes="1">
</cman>
<clusternodes>
<clusternode name="buba_cluster" votes="1">
<fence>
<method name="single">
<device name="human" ipaddr="192.168.0.1"/>
</method>
</fence>
</clusternode>
<clusternode name="gump_cluster" votes="1">
<fence>
<method name="single">
<device name="human" ipaddr="192.168.0.2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="human" agent="fence_manual"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="datamoverdomain">
<failoverdomainnode name="gump_cluster" priority="1"/>
<failoverdomainnode name="buba_cluster" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<script name="simple" file="/etc/init.d/simple"/>
</resources>
<resourcegroup name="datamover" domain="datamoverdomain">
<script ref="simple"/>
</resourcegroup>
</rm>
</cluster>
If anyone has an idea it will be great!! but I will also be very
thankfull if anybody could give me some debugging techniques
to see what happens there (I tried gdb clurgmrgd and a break point on
group_op but I lost all my hope when I saw that the process was detached
and that threads were launched.... ( I don't know if
and how we can debug multithreaded programs with gdb))
--
Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster