All, I'm not at all sure what is going on here. I have a large number of KVM guests being managed by a 5 node RHEL5.6 cluster and recently whenever I modify the cluster config, or reload / restart libvirtd (to add / remove guests) rgmanager goes berserk. When this happens rgmanager lists the guests as "failed" services and this is the result it the log: Dec 29 10:44:17 plieadies1 clurgmgrd[6770]: <debug> 5 events processed Dec 29 10:49:56 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:49:59 plieadies1 last message repeated 3 times Dec 29 10:49:59 plieadies1 clurgmgrd[6770]: <notice> status on vm "Demeter" returned 2 (invalid argument(s)) Dec 29 10:50:00 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> status on vm "IoA" returned 2 (invalid argument(s)) Dec 29 10:50:00 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> status on vm "IoF" returned 2 (invalid argument(s)) Dec 29 10:50:00 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> status on vm "Pluto" returned 2 (invalid argument(s)) Dec 29 10:50:00 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> status on vm "Venus" returned 2 (invalid argument(s)) Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <debug> No other nodes have seen vm:Demeter Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> Stopping service vm:Demeter Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <debug> No other nodes have seen vm:IoA Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> Stopping service vm:IoA Dec 29 10:50:00 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> stop on vm "Demeter" returned 2 (invalid argument(s)) Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <debug> No other nodes have seen vm:IoF Dec 29 10:50:00 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> stop on vm "IoA" returned 2 (invalid argument(s)) Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <crit> #12: RG vm:Demeter failed to stop; intervention required Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> Service vm:Demeter is failed Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> Stopping service vm:IoF Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <debug> No other nodes have seen vm:Pluto Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <crit> #12: RG vm:IoA failed to stop; intervention required Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> Service vm:IoA is failed Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <debug> No other nodes have seen vm:Venus Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> Stopping service vm:Pluto Dec 29 10:50:00 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> stop on vm "IoF" returned 2 (invalid argument(s)) Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> Stopping service vm:Venus Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <crit> #12: RG vm:IoF failed to stop; intervention required Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> Service vm:IoF is failed Dec 29 10:50:00 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> stop on vm "Venus" returned 2 (invalid argument(s)) Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <crit> #12: RG vm:Venus failed to stop; intervention required Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> Service vm:Venus is failed Dec 29 10:50:00 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> stop on vm "Pluto" returned 2 (invalid argument(s)) Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <crit> #12: RG vm:Pluto failed to stop; intervention required Dec 29 10:50:00 plieadies1 clurgmgrd[6770]: <notice> Service vm:Pluto is failed Dec 29 10:50:02 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:50:12 plieadies1 last message repeated 4 times Dec 29 10:50:19 plieadies1 clurgmgrd[6770]: <debug> 13 events processed Dec 29 10:50:20 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:50:20 plieadies1 clurgmgrd[6770]: <notice> status on vm "saturn" returned 2 (invalid argument(s)) Dec 29 10:50:20 plieadies1 clurgmgrd[6770]: <debug> No other nodes have seen vm:saturn Dec 29 10:50:20 plieadies1 clurgmgrd[6770]: <notice> Stopping service vm:saturn Dec 29 10:50:20 plieadies1 clurgmgrd: [6770]: <err> Could not determine Hypervisor Dec 29 10:50:20 plieadies1 clurgmgrd[6770]: <notice> stop on vm "saturn" returned 2 (invalid argument(s)) Dec 29 10:50:20 plieadies1 clurgmgrd[6770]: <crit> #12: RG vm:saturn failed to stop; intervention required Dec 29 10:50:20 plieadies1 clurgmgrd[6770]: <notice> Service vm:saturn is failed Dec 29 10:50:31 plieadies1 clurgmgrd[6770]: <debug> 1 events processed Dec 29 10:59:30 plieadies1 clurgmgrd[6770]: <debug> 1 events processed The "Could not determine Hypervisor" message is coming from the following block of code in vm.sh: # If someone selects a hypervisor, honor it. # Otherwise, ask virsh what the hypervisor is. # if [ -z "$OCF_RESKEY_hypervisor" ] || [ "$OCF_RESKEY_hypervisor" = "auto" ]; then export OCF_RESKEY_hypervisor="`virsh version | grep \"Running hypervisor:\" | awk '{print $3}' | tr A-Z a-z`" if [ -z "$OCF_RESKEY_hypervisor" ]; then ocf_log err "Could not determine Hypervisor" return $OCF_ERR_ARGS fi echo Hypervisor: $OCF_RESKEY_hypervisor fi What's really twisting my shorts is that the command being run to determine the hypervisor works fine at the command prompt: [root@plieadies1 ~]# virsh version | grep "Running hypervisor:" | awk '{print $3}' | tr A-Z a-z qemu I can migrate the still running guest to another node, use clusvcadm to disable it in rgmanager, and then use a wrapper on virsh which returns '0' when attempting to start an already running guest to return the still running vm to cluster control so I can work around this, however, I'm hugely concerned that I'm going to end up with a host failure and a heap of trouble at some point. Anyone seen something similar or have thoughts on this? Guesses as to why rgmanager / vm.sh are failing to detect the running hypervisor? --AB -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster