Hi! I have been trying to get a KVM guest running as a clustered service (two node cluster with GFS2 shared images), in order to restart the guest on the alive cluster node, in case the other node crashes. The problem is that I can't get the VM service managed by the cluster daemons (manually start/stop/live migrate my VM guest works fine). This is how my "cluster.conf" file looks like: <?xml version="1.0"?> <cluster config_version="16" name="KVMCluster"> <fence_daemon post_fail_delay="0" post_join_delay="10"/> <clusternodes> <clusternode name="nodeAint" nodeid="1" votes="1"> <multicast addr="239.0.01" interface="eth2"/> <fence> <method name="single"> <device name="nodeA_ilo"/> </method> </fence> </clusternode> <clusternode name="nodeBint" nodeid="2" votes="1"> <multicast addr="239.0.0.1" interface="eth2"/> <fence> <method name="single"> <device name="nodeB_ilo"/> </method> </fence> </clusternode> </clusternodes> <quorumd interval="1" label="QuoDisk" tko="10" votes="1"/> <cman expected_votes="3" two_node="0"> <multicast addr="239.0.0.1"/> </cman> <fencedevices> <fencedevice agent="fence_ilo" hostname="nodeAcn" login="hp" name="nodeA_ilo" passwd="hpinvent"/> <fencedevice agent="fence_ilo" hostname="nodeBcn" login="hp" name="nodeB_ilo" passwd="hpinvent"/> </fencedevices> <rm log_level="7"> <failoverdomains> <failoverdomain name="FD1" ordered="0" restricted="0"> <failoverdomainnode name="nodeAint" priority="1"/> <failoverdomainnode name="nodeBint" priority="1"/> </failoverdomain> </failoverdomains> <service autostart="1" exclusive="0" domain="FD1" name="guest00_service" recovery="relocate"> <vm domain="FD1" autostart="1" migrate="live" use_virsh="1" hypervisor="qemu" name="guest00" hypervisor_uri="qemu+ssh:///system" path="/etc/libvirt/qemu/guest00.xml"> </vm> </service> <resources/> </rm> <dlm plock_ownership="1" plock_rate_limit="0"/> <gfs_controld plock_rate_limit="0"/> </cluster> And these are the errors I'm getting at syslog: Apr 23 11:28:44 nodeB clurgmgrd[5490]: <notice> Resource Group Manager Starting Apr 23 11:28:44 nodeB clurgmgrd[5490]: <info> Loading Service Data Apr 23 11:28:45 nodeB clurgmgrd[5490]: <info> Initializing Services Apr 23 11:28:45 nodeB clurgmgrd: [5490]: <crit> xend/libvirtd is dead; cannot stop guest00 Apr 23 11:28:45 nodeB clurgmgrd[5490]: <notice> stop on vm "guest00" returned 1 (generic error) Apr 23 11:28:45 nodeB clurgmgrd[5490]: <info> Services Initialized Apr 23 11:28:45 nodeB clurgmgrd[5490]: <info> State change: Local UP Apr 23 11:28:51 nodeB clurgmgrd[5490]: <notice> Starting stopped service service:guest00_service Apr 23 11:28:51 nodeB clurgmgrd[5490]: <notice> start on vm "guest00" returned 127 (unspecified) Apr 23 11:28:51 nodeB clurgmgrd[5490]: <warning> #68: Failed to start service:guest00_service; return value: 1 Apr 23 11:28:51 nodeB clurgmgrd[5490]: <notice> Stopping service service:guest00_service Apr 23 11:28:51 nodeB clurgmgrd: [5490]: <crit> xend/libvirtd is dead; cannot stop guest00 Apr 23 11:28:51 nodeB clurgmgrd[5490]: <notice> stop on vm "guest00" returned 1 (generic error) Apr 23 11:28:51 nodeB clurgmgrd[5490]: <crit> #12: RG service:guest00_service failed to stop; intervention required Apr 23 11:28:51 nodeB clurgmgrd[5490]: <notice> Service service:guest00_service is failed Apr 23 11:28:51 nodeB clurgmgrd[5490]: <crit> #13: Service service:guest00_service failed to stop cleanly I have checked the status of the libvirtd daemon, and it's running fine: [root@nodeB ~]# service libvirtd status libvirtd (pid 5352) is running... And all VM guests management using "virsh" is also running fine. I'm using: "cman-2.0.115-1.el5_4.9", "rgmanager-2.0.52-1.el5.centos.2", "libvirt-0.6.3-20.1.el5_4" I'm missing something on the "cluster.conf"??? Or at the libvirtd daemon?¿ Thanks for your help! Alex. |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster