Re: [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal

"Daniel P. Berrange" <berrange@xxxxxxxxxx> · Tue, 19 May 2015 17:13:26 +0100

On Tue, May 19, 2015 at 06:08:10PM +0200, Michael S. Tsirkin wrote:
> On Tue, May 19, 2015 at 04:45:03PM +0100, Daniel P. Berrange wrote:
> > On Tue, May 19, 2015 at 05:39:05PM +0200, Michael S. Tsirkin wrote:
> > > On Tue, May 19, 2015 at 04:35:08PM +0100, Daniel P. Berrange wrote:
> > > > On Tue, May 19, 2015 at 04:03:04PM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Daniel P. Berrange (berrange@xxxxxxxxxx) wrote:
> > > > > > On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
> > > > > > > On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> > > > > > > >> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > > > > > > >>> backgrond:
> > > > > > > >>> Live migration is one of the most important features of virtualization technology.
> > > > > > > >>> With regard to recent virtualization techniques, performance of network I/O is critical.
> > > > > > > >>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > > > > > > >>> performance gap with native network I/O. Pass-through network devices have near
> > > > > > > >>> native performance, however, they have thus far prevented live migration. No existing
> > > > > > > >>> methods solve the problem of live migration with pass-through devices perfectly.
> > > > > > > >>>
> > > > > > > >>> There was an idea to solve the problem in website:
> > > > > > > >>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > > > > > > >>> Please refer to above document for detailed information.
> > > > > > > >>>
> > > > > > > >>> So I think this problem maybe could be solved by using the combination of existing
> > > > > > > >>> technology. and the following steps are we considering to implement:
> > > > > > > >>>
> > > > > > > >>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > > > > > > >>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > > > > > > >>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > > > > > > >>>
> > > > > > > >>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > > > > > > >>>    then libvirt will call the previous registered initialize callbacks. so through
> > > > > > > >>>    the callback functions, we can create the bonding device according to the XML
> > > > > > > >>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> > > > > > > >>>    easily.
> > > > > > > >> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > > > > > > >> I think configuration of networking is really something that must be left to
> > > > > > > >> the guest OS admin to control. I don't think the guest agent should be trying
> > > > > > > >> to reconfigure guest networking itself, as that is inevitably going to conflict
> > > > > > > >> with configuration attempted by things in the guest like NetworkManager or
> > > > > > > >> systemd-networkd.
> > > > > > > > There should not be a conflict.
> > > > > > > > guest agent should just give NM the information, and have  NM do
> > > > > > > > the right thing.
> > > > > > > 
> > > > > > > That assumes the guest will have NM running. Unless you want to severely
> > > > > > > limit the scope of usefulness, you also need to handle systems that have
> > > > > > > NM disabled, and among those the different styles of system network
> > > > > > > config. It gets messy very fast.
> > > > > > 
> > > > > > Also OpenStack already has a way to pass guest information about the
> > > > > > required network setup, via cloud-init, so it would not be interested
> > > > > > in any thing that used the QEMU guest agent to configure network
> > > > > > manager. Which is really just another example of why this does not
> > > > > > belong anywhere in libvirt or lower.  The decision to use NM is a
> > > > > > policy decision that will always be wrong for a non-negligble set
> > > > > > of use cases and as such does not belong in libvirt or QEMU. It is
> > > > > > the job of higher level apps to make that kind of policy decision.
> > > > > 
> > > > > This is exactly my worry though; why should every higher level management
> > > > > system have it's own way of communicating network config for hotpluggable
> > > > > devices.  You shoudln't need to reconfigure a VM to move it between them.
> > > > > 
> > > > > This just makes it hard to move it between management layers; there needs
> > > > > to be some standardisation (or abstraction) of this;  if libvirt isn't the place
> > > > > to do it, then what is?
> > > > 
> > > > NB, openstack isn't really defining a custom thing for networking here. It
> > > > is actually integrating with the standard cloud-init guest tools for this
> > > > task. Also note that OpenStack has defined a mechanism that works for
> > > > guest images regardless of what hypervisor they are running on - ie does
> > > > not rely on any QEMU or libvirt specific functionality here.
> > > 
> > > I'm not sure what the implication is.  No new functionality should be
> > > implemented unless we also add it to vmware?  People that don't want kvm
> > > specific functionality, won't use it.
> > 
> > I'm saying that standardization of virtualization policy in libvirt is the
> > wrong solution, because different applications will have different viewpoints
> > as to what "standardization" is useful / appropriate. Creating a standardized
> > policy in libvirt for KVM, does not help OpenStack may help people who only
> > care about KVM, but that is not the entire ecosystem. OpenStack has a
> > standardized solution for guest configuration imformation that works across
> > all the hypervisors it targets.  This is just yet another example of exactly
> > why libvirt aims to design its APIs such that it exposes direct mechanisms
> > and leaves usage policy decisions upto the management applications. Libvirt
> > is not best placed to decide which policy all these mgmt apps must use for
> > this task.
> > 
> > Regards,
> > Daniel
> 
> 
> I don't think we are pushing policy in libvirt here.
> 
> What we want is a mechanism that let users specify in the XML:
> interface X is fallback for pass-through device Y
> Then when requesting migration, specify that it should use
> device Z on destination as replacement for Y.
> 
> We are asking libvirt to automatically
> 1.- when migration is requested, request unplug of Y
> 2.- wait until Y is deleted
> 3.- start migration
> 4.- wait until migration is completed
> 5.- plug device Z on destination
> 
> 
> I don't see any policy above: libvirt is in control of migration and
> seems best placed to implement this.

Even this implies policy in libvirt about handling of failure conditions.
How long to wait for unplug. What todo when unplug fails. What todo it
plug fails on the target. It is hard to report these errors to application
and when multiple devices are to be plugged/unplugged, the application will
also have trouble determining whether some or all of the devices are still
present after failure. Even beyond that, this is pointless as all 5 steps
you describe here are already possible to perform with existing functionality
in libvirt, with the application having direct control over what todo in the
failure scenarios.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list