Hello, I have two KVM virtual machine nodes in a high-availability cluster using Pacemaker + Heartbeat on Ubuntu 10.04 Server amd64. This cluster hosts a single Ubuntu 10.04 VM which uses a qcow2 image file, myvm.qcow2, with a backing file, backingfile.qcow2. This morning, the VM suddenly powered off. I attempted to start it again with virsh start domain, but it would only start briefly and then power off again. I checked the qcow2 disk image and found countless corruption errors: root@vmhost:/mnt/storage/vmstore/disks# qemu-img info myvm.qcow2 image: myvm.qcow2 file format: qcow2 virtual size: 9.8G (10485760000 bytes) disk size: 13G cluster_size: 65536 backing file: backingfile.qcow2 (actual path: backingfile.qcow2) Snapshot list: ID TAG VM SIZE DATE VM CLOCK 1.5G 2056-05-05 21:01:212795663:45:42.642 /archive/1006/20100627000/2il_root/save/archive/1002/20100204005/1 743M 1995-08-16 12:47:352289751:06:20.183 root@vmhost:/mnt/storage/vmstore/disks# qemu-img check myvm.qcow2 2>&1 | head ERROR OFLAG_COPIED: offset=80000002047d0000 refcount=0 ERROR OFLAG_COPIED: offset=8000000212e50000 refcount=0 ERROR OFLAG_COPIED: offset=80000001ffde0000 refcount=0 ERROR OFLAG_COPIED: offset=80000001ff710000 refcount=0 ERROR OFLAG_COPIED: offset=8000000216ec0000 refcount=0 ERROR OFLAG_COPIED: offset=8000000206db0000 refcount=0 ERROR OFLAG_COPIED: offset=80000001ff720000 refcount=0 ERROR OFLAG_COPIED: offset=80000001ffdf0000 refcount=0 ERROR OFLAG_COPIED: offset=8000000212e60000 refcount=0 ERROR OFLAG_COPIED: offset=8000000212e70000 refcount=0 root@vmhost:/mnt/storage/vmstore/disks# qemu-img info backingfile.qcow2 image: backingfile.qcow2 file format: qcow2 virtual size: 9.8G (10485760000 bytes) disk size: 4.8G cluster_size: 65536 root@vmhost:/mnt/storage/vmstore/disks# qemu-img check backingfile.qcow2 No errors were found on the image. If I use qemu-img to convert the image, the resulting image is "clean": # convert myvm.qcow2 -O qcow2 /tmp/test.qcow2 # qemu-img check /tmp/test.qcow2 No errors were found on the image. I had this corruption happen a month ago to a different VM on the same machine but a different physical drive, so I do not believe it to be a physical disk failure. I can find nothing in /var/log that gives any more information related to this corruption. What other debug information can I provide to diagnose why these images are getting corrupted and taking these running VMs offline? Thanks, Andrew Martin _______________________________________________ libvirt-users mailing list libvirt-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvirt-users