El vie, 24-04-2009 a las 13:41 -0400, Lon Hohberger escribió: > On Wed, 2009-04-22 at 01:40 -0400, Maykel Moya wrote: > > I still can't get my service automatically relocated after > > _successfully_ fencing its owner node. > > > > I have a 4 node cluster n{1,2,3,4} and 4 services s{1,2,3,4}. My fence > > device uses 'off' as action, so a successful fence means the node is > > off. > > > > Say, s4 running on n4 and I do a 'ip link set eth0 down' on n4. n4 get > > successfully fenced but s4 is never relocated to one of the other > > available nodes which means s4 is not available. > > > > Find attached the cluster.conf. > > Conf looks okay, what do the logs say? Any other errors? It looks like > things should be working correctly. The relevant part ---- Apr 26 02:08:29 e1b01 kernel: [345031.041719] dlm: closing connection to node 4 Apr 26 02:08:29 e1b01 clurgmgrd[3880]: <debug> Membership Change Event Apr 26 02:08:29 e1b01 clurgmgrd[3880]: <info> State change: e1b04 DOWN Apr 26 02:08:29 e1b01 clurgmgrd[3880]: <debug> Membership Change Event Apr 26 02:08:29 e1b01 clurgmgrd[3880]: <debug> Membership Change Event Apr 26 02:08:29 e1b01 clurgmgrd[3880]: <debug> Membership Change Event Apr 26 02:08:29 e1b01 fenced[3850]: e1b04 not a cluster member after 0 sec post_fail_delay Apr 26 02:08:29 e1b01 fenced[3850]: fencing node "e1b04" Apr 26 02:08:40 e1b01 fenced[3850]: can't get node number for node �ҋ#010Pҋ#010#020 Apr 26 02:08:40 e1b01 fenced[3850]: fence "e1b04" success ---- > 'cman_tool services' and 'cman_tool nodes' output would be helpful, > too. It's a bit odd, clustat saying that node e1b04 is offline but service4 is owned by e1b04 and started. e1b01:/var/log# clustat Cluster Status for cinfomed @ Sun Apr 26 02:09:07 2009 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ e1b01 1 Online, Local, rgmanager e1b02 2 Online, rgmanager e1b03 3 Online, rgmanager e1b04 4 Offline Service Name Owner (Last) State ------- ---- ----- ------ ----- service:vmail1_svc e1b01 started service:vmail2_svc e1b04 started service:vmail3_svc e1b03 started service:vmail4_svc e1b04 started e1b01:/var/log# cman_tool services type level name id state fence 0 default 00010001 none [1 2 3] dlm 1 rgmanager 00010004 none [1 2 3] e1b01:/var/log# cman_tool nodes Node Sts Inc Joined Name 1 M 1404 2009-04-22 02:17:09 e1b01 2 M 1432 2009-04-22 02:51:31 e1b02 3 M 1412 2009-04-22 02:17:11 e1b03 4 X 1408 e1b04 Forgot to mention e1b01:/var/log# lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 5.0.1 (lenny) Release: 5.0.1 Codename: lenny e1b01:/var/log# cman_tool -V cman_tool 2.03.09 (built Nov 3 2008 18:22:25) Copyright (C) Red Hat, Inc. 2004-2008 All rights reserved. e1b01:/var/log# uname -r 2.6.26-2-686 ---- This is the only thing I'm missing to deploy, have tried fencing with 'reboot', with 'off', setting service recovery policy and 'relocate' and nothing solves it. If a node goes down, the service is not migrated after fence it. Regards, maykel -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster