Hi Jakov, I managed to get fencing working - at least enough for my experiments. Sure enough, I hit the same problem: clusternode30 is running service "SENTINEL" - and then is powered down at ~ 18:19 Oct 27 18:19:55 clusternode27 clurgmgrd[2785]: <debug> Membership Change Event Oct 27 18:19:55 clusternode27 clurgmgrd[2785]: <info> State change: clusternode30 DOWN Oct 27 18:19:55 clusternode27 clurgmgrd[2785]: <debug> Membership Change Event Oct 27 18:19:55 clusternode27 clurgmgrd[2785]: <debug> Membership Change Event Oct 27 18:19:55 clusternode27 fenced[2760]: clusternode30 not a cluster member after 0 sec post_fail_delay Oct 27 18:19:55 clusternode27 fenced[2760]: fencing node "clusternode30" Oct 27 18:19:55 clusternode27 fenced[2760]: can't get node number for node p<CA>@#001 Oct 27 18:19:55 clusternode27 fenced[2760]: fence "clusternode30" success Oct 27 18:19:55 clusternode27 clurgmgrd[2785]: <debug> 22 rules loaded (The "can't get node number" looks suspicious, but fenced claims to succeed). Next morning - it still hasn't relocated the service: Cluster Status for testcluster @ Wed Oct 28 11:13:55 2009 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ clusternode27 27 Online, Local, rgmanager clusternode28 28 Online, rgmanager clusternode30 30 Offline Service Name Owner (Last) State ------- ---- ----- ------ ----- service:SCRIPT clusternode28 started service:SENTINEL clusternode30 started service:VIP clusternode27 started service:mysql_authdb_service clusternode27 started I am going to strip my config down later on so that SENTINEL is the only running service. My fencing mechanism is pretty pathetic - I have added a new fence agent that does nothing but always succeeds (which I hope is enough for this stage in my education) - but my understanding is that the sequence of events should be something like this: 1. <node fails> 2. cman notices (and groupd) 3. fencing is applied to the node 4. the service is relocated - or marked as failed regards, Martin -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Jakov Sosic Sent: 28 October 2009 10:04 To: linux-cluster@xxxxxxxxxx Subject: Re: service state unchanged when host crashes On Tue, 27 Oct 2009 09:57:50 -0000 "Martin Waite" <Martin.Waite@xxxxxxxxxxxx> wrote: > I am running Debian Lenny 64-bit. Is that going to be a problem for > me ? Well maybe. Last time I tried RedHat Cluster Suite on Debian Lenny was two months ago, and then I had stumbled upon the following bug: http://www.mail-archive.com/linux-cluster@xxxxxxxxxx/msg06018.html I don't know if they have fixed that bug... but it resembles totally to your problem... Node goes down, node gets fenced, service is seen as down by rgmanager, but there is no action to relocate it to a live cluster member. That was a start of a project for me, so after that I migrated to CentOS 5 (which is a free RHEL fork). > I think you have given me enough of a pointer - ie. I haven't > configured fencing properly - to get me going again. Thanks. I can see that from the logs now :) If you get to the point where bug that I explained earlier pops up, please share that information here so that we know the state of RHCS on Debian. -- | Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D | ================================================================= | start fighting cancer -> http://www.worldcommunitygrid.org/ | -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster