Re: I/O errors after migration - why?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2009-03-28 at 11:21 +0100, Tomasz Chmielewski wrote:
> Nolan schrieb:
> > Tomasz Chmielewski <mangoo <at> wpkg.org> writes:
> >> I'm trying to perform live migration by following the instructions on 
> >> http://www.linux-kvm.org/page/Migration.
> >> Unfortunately, it doesn't work very well - guest is migrated, but looses 
> >> access to its disk.
> > 
> > The LSI logic scsi device model doesn't implement device state save/restore. 
> > Any suspend/resume, snapshot or migration will fail.
> 
> Oh, that sucks - as not everything supports virtio (which doesn't work 
> for me as well for some reason) - like Windows (which should be 
> addressed soon with block virtio drivers), but also older installations, 
> running older kernels.

It is indeed a shame.  I wish I had the time to investigate and resolve
the problems with my patch that I linked to previously.

LSI in particular is important for interoperability, as that is what
VMware uses.

> Does IDE support migration?

It appears to, but I am not 100% sure that it will always survive
migration under heavy IO load.  I've gotten mixed messages on whether or
not the qemu core waits for all in flight IOs to complete or if the
device models need to checkpoint pending IOs themselves.  Experimental
evidence suggests that it does not.  Also, from ide.c's checkpoint save
code:
    /* XXX: if a transfer is pending, we do not save it yet */

I think the ideal here would be to stop the CPUs, but let the device
models continue to run.  Once all pending IOs have completed (and DMAed
data and/or descriptors into guest memory, or raised interrupts, or
whatever) then checkpoint all device state.  When the guest resumes, it
will see an unusual flurry of IO completions and/or interrupts, but it
should be able to handle that OK.  Shouldn't look much different from
SMM taking over for a while during high IO load.

This would save a lot of (unwritten, complex, hard to test)
checkpointing code in the device models.  Might cause a missed timer
interrupt or two if there is a lot of slow IO, but that can be
compensated for if needed.

> > I sent a patch that partially addresses this (but is buggy in the presence of
> > in-flight IO):
> > http://lists.gnu.org/archive/html/qemu-devel/2009-01/msg00744.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux