On Wed, Apr 20, 2011 at 11:55:03AM -0400, Digimer wrote: > Hi all, > > I've got a RHCS2 cluster on el5.6 using rgmanager to manage Xen domUs. > I've got the <vm ... /> service set to restart on failure with a a > maximum restart of 2 and a restart recovery time of 600 seconds. > > <vm name="vm0001_lz1" domain="cc48_primary" > path="/xen_shared/definitions/" autostart="0" exclusive="0" > recovery="restart" max_restarts="2" restart_expire_time="600"/> > > I test by killing the VM using 'echo c > /proc/sysrq-trigger' multiple > times well within 10 minutes, and the cluster does recover the VM every > time, but always on the node it was previously running in. The failover > domain is: > > <failoverdomain name="cc48_primary" nofailback="0" ordered="1" > restricted="1"> > <failoverdomainnode name="cc0048.iplink.net" priority="1"/> > <failoverdomainnode name="cc0049.iplink.net" priority="2"/> > </failoverdomain> > > Any idea what I am doing wrong? > > Thanks! > > -- > Digimer > E-Mail: digimer@xxxxxxxxxxx > AN!Whitepapers: http://alteeve.com > Node Assassin: http://nodeassassin.org > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster Try changing from recovery="restart" to recovery="relocate". Here's how we have it set up for one of our clusters: <service autostart="1" domain="mysqld" name="MYSQLD" recovery="relocate"> I believe restart uses the node the service was running on prior to the failure by design. Joe -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster