Re: qcow2 corruption observed, fixed by reverting old change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jamie Lokier <jamie <at> shareable.org> writes:
> 
> As you see from the subject, I'm getting qcow2 corruption.
> 
> I have a Windows 2000 guest which boots and runs fine in kvm-72, fails
> with a blue-screen indicating file corruption errors in kvm-73 through
> to kvm-83 (the latest), and succeeds if I replace block-qcow2.c with
> the version from kvm-72.
> 
> The blue screen appears towards the end of the boot sequence, and
> shows only briefly before rebooting.  It says:
> 
>     STOP: c0000218 (Registry File Failure)
>     The registry cannot load the hive (file):
>     \SystemRoot\System32\Config\SOFTWARE
>     or its log or alternate.
>     It is corrupt, absent, or not writable.
> 
>     Beginning dump of physical memory
>     Physical memory dump complete. Contact your system administrator or
>     technical support [...?]

I have got a massive KVM installation with hundreds of guests runnings dozens of
different OSes, and have also noticed multiple qcow2 corruption bugs. All my
guests are using the qcow2 format, and my hosts are running vanilla linux 2.6.28
x86_64 kernels and use NPT (Opteron 'Barcelona' 23xx processors).

My Windows 2000 guests BSOD just like yours with kvm-73 or newer. I have to run
kvm-75 (I need the NPT fixes it contains) with block-qcow2.c reverted to the
version from kvm-72 to fix the BSOD.

kvm-73+ also causes some of my Windows 2003 guests to exhibit this exact
registry corruption error:
http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599
This bug is also fixed by reverting block-qcow2.c to the version from kvm-72.

I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because of the
qcow2 performance regression caused by the default writethrough caching policy)
but it randomly triggers an even worse bug: the moment I shut down a guest by
typing "quit" in the monitor, it sometimes overwrite the first 4kB of the disk
image with mostly NUL bytes (!) which completely destroys it. I am familiar with
the qcow2 format and apparently this 4kB block seems to be an L2 table with most
entries set to zero. I have had to restore at least 6 or 7 disk images from
backup after occurences of that bug. My intuition tells me this may be the qcow2
code trying to allocate a cluster to write a new L2 table, but not noticing the
allocation failed (represented by a 0 offset), and writing the L2 table at that
0 offset, overwriting the qcow2 header.

Fortunately this bug is also fixed by running kvm-75 with block-qcow2.c reverted
to its kvm-72 version.

Basically qcow2 in kvm-73 or newer is completely unreliable.

-marc

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux