On Thu, Sep 25, 2014 at 02:12:24PM +0200, Jiri Denemark wrote: > On Thu, Sep 25, 2014 at 12:00:41 +0200, Cristian KLEIN wrote: > > On 2014-09-24 15:06, Jiri Denemark wrote: > > > This mostly looks good in isolation but I think this is not going to > > > work. When post-copy is started, QEMU on the destination host will be > > > resumed (I'm not sure if that happens automatically or we have to do > > > it), which basically means we need to jump out of the Perform state and > > > call Finish and once it returns, we should keep waiting for the > > > post-copy migration to finish in Confirm state and kill the domain at > > > the end. It's certainly possible the steps we need to do are a bit > > > different since I'm not familiar with all the details about post-copy > > > migration, but I believe we need to do something. And just running a > > > single QEMU command is not enough to start post-copy in libvirt. > > > > I'm not sure to follow. I tested the patch and it worked well: A VM that > > was "unmigratable" with pre-copy was successfully migrated through > > post-copy. Through the migration protocol, once we start post-copy on > > the source qemu, the following will happen: > > > > - source qemu suspends VM and transfer CPU state; > > - destination qemu resumes the VM. > > Hmm, that's a bit unfortunate. I think we will need a way to tell QEMU > not to resume the CPU automatically. The process should flow as follows: > > - libvirt sends migrate-start-postcopy command to QEMU > - QEMU suspends the VM and transfers CPU state > - QEMU tells us we can resume the destination > - libvirt tells the destination QEMU to resume the VM > - libvirt waits until migration is done > - libvirt kills the source QEMU > > Perhaps, we could tell the destination QEMU to resume the VM while the > source is transferring CPU state if that's allowed by QEMU to minimize > downtime. > > > Could you tell me why you think it's necessary to jump out of Perform > > state? What is libvirt doing when calling Finish that the destination VM > > requires to function properly? > > The problem is Finish does more than just resuming the VM on the > destination. Before resuming the VM, libvirt needs to transfer locks on > resources from the source to the destination, it needs to enable > networking for the destination QEMU, etc. Without all this, the VM won't > be able to really work on the destination. Not to mention that if > something fails while the VM is already resumed on the destination, the > code in Perform phase would just abort the migration and resume the VM > on the source, which is wrong. We need to kill both ends since non of > them has the complete state to be able to continue running the VM. > > BTW, it's going to work in simple cases, when there's no lock daemon in > use, only basic Linux bridge support is used, etc., which is why it > works just fine for you. But we need to count with all the non-simple > cases too. Yes, having this work correctly with virtlockd and sanlock is really mandatory for including the code. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list