On Thu, 2008-02-28 at 12:32 +0100, Agnieszka Kukałowicz wrote: > > > > And I don't have situation that cman_tool nodes says: > > " Last fenced 2008-02-27 15:24:16 by override" > > > > I did more tests to find the cause of the problem. > I found that clustat has problem with "restricted" failover domain. > I tested 2 examples of my configuration: > > 1. failover domain is "restricted" > > <rm> > <failoverdomains> > <failoverdomain name="VM_w1_failover" ordered="0" restricted="1"> > <failoverdomainnode name="w1.local" priority="1"/> > </failoverdomain> > <failoverdomain name="VM_w2_failover" ordered="0" restricted="1"> > <failoverdomainnode name="w2.local" priority="1"/> > > </failoverdomain> > </failoverdomains> > <resources/> > <vm autostart="1" domain="VM_w1_failover" exclusive="0" > name="VM_Work11_RHEL51" path="/virts/w11" recovery="restart"/> > <vm autostart="1" domain="VM_w1_failover" exclusive="0" > name="VM_Work12_RHEL51" path="/virts/w12" recovery="restart"/> > <vm autostart="0" domain="VM_w1_failover" exclusive="0" > name="VM_Work13_RHEL51" path="/virts/w13" recovery="disable"/> > <vm autostart="1" domain="VM_w2_failover" exclusive="0" > name="VM_Work21_RHEL51" path="/virts/w21" recovery="restart"/> > <vm autostart="0" domain="VM_w2_failover" exclusive="0" > name="VM_Work22_RHEL51" path="/virts/w22" recovery="disable"/> > <vm autostart="0" domain="VM_w2_failover" exclusive="0" > name="VM_Work23_RHEL51" path="/virts/w23" recovery="disable"/> > </rm> > > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > w2.local 1 Online, Local, rgmanager > w1.local 2 Online, rgmanager > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > vm:VM_Work11_RHEL51 w1.local started > vm:VM_Work12_RHEL51 w1.local started > vm:VM_Work13_RHEL51 (none) disabled > vm:VM_Work21_RHEL51 w2.local started > vm:VM_Work22_RHEL51 (none) disabled > vm:VM_Work23_RHEL51 (none) disabled > > After power off node w2.local and fencing "w2.local" by "w1.local" > clustat still shows the service vm:VM_Work21_RHEL51 is started on > w2.local > Oh, so you had a restricted failover domain, and no nodes were online. Looking at the code, it appears to be "correct weirdness" or perhaps "known dysfunction": http://sources.redhat.com/git/?p=cluster.git;a=blob;f=rgmanager/src/daemons/groups.c;h=a8325eec3425bbe124696bfe3dcd6f7ea1eebfea;hb=8e504af1adbadd2cb8fe7cab191d79a8d835540c#l742 /* * TODO * Mark a service as 'stopped' if no members in its * restricted fail-over domain are running. */ Why it occurs: The service states are typically only altered by a node which is taking action on a service. In this case, no nodes are online which are allowed to act on the service - therefore, they do nothing. What needs to happen in this case is: * Check service failover domain config * Check nodes online * Mark service 'stopped' if no nodes are online capable of running the service The reason this hasn't been fixed is because it doesn't actually cause any service-availability problems - it just causes wrong reporting. Could you file a bugzilla? :) -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster