On 04/17/2015 04:53 AM, Chen Fan wrote: > backgrond: > Live migration is one of the most important features of virtualization technology. > With regard to recent virtualization techniques, performance of network I/O is critical. > Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant > performance gap with native network I/O. Pass-through network devices have near > native performance, however, they have thus far prevented live migration. No existing > methods solve the problem of live migration with pass-through devices perfectly. > > There was an idea to solve the problem in website: > https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf > Please refer to above document for detailed information. This functionality has been on my mind/bug list for a long time, but I haven't been able to pursue it much. See this BZ, along with the original patches submitted by Shradha Shah from SolarFlare: https://bugzilla.redhat.com/show_bug.cgi?id=896716 (I was a bit optimistic in my initial review of the patches - there are actually a lot of issues that weren't handled by those patches.) > > So I think this problem maybe could be solved by using the combination of existing > technology. and the following steps are we considering to implement: > > - before boot VM, we anticipate to specify two NICs for creating bonding device > (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses > in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest. An interesting idea, but I think that is a 2nd level enhancement, not necessary initially (and maybe not ever, due to the high possibility of it being extremely difficult to get right in 100% of the cases). > > - when qemu-guest-agent startup in guest it would send a notification to libvirt, > then libvirt will call the previous registered initialize callbacks. so through > the callback functions, we can create the bonding device according to the XML > configuration. and here we use netcf tool which can facilitate to create bonding device > easily. This isn't quite making sense - the bond will be on the guest, which may not have netcf installed. Anyway, I think it should be up to the guest's own system network config to have the bond already setup. If you try to impose it from outside that infrastructure, you run too much risk of running afoul of something on the guest (e.g. NetworkManager) > > - during migration, unplug the passthroughed NIC. then do native migration. Correct. This is the most important part. But not just unplugging it, you also need to wait until the unplug operation completes (it is asynchronous). (After this point, the emulated NIC that is part of the bond would get all of the traffic). > > - on destination side, check whether need to hotplug new NIC according to specified XML. > usually, we use migrate "--xml" command option to specify the destination host NIC mac > address to hotplug a new NIC, because source side passthrough NIC mac address is different, > then hotplug the deivce according to the destination XML configuration. Why does the MAC address need to be different? Are you suggesting doing this with passed-through non-SRIOV NICs? An SRIOV virtual function gets its MAC address from the libvirt config, so it's very simple to use the same MAC address across the migration. Any network card that would be able to do this on any sort of useful scale will be SRIOV-capable (or should be replaced with one that is - some of them are not that expensive). > > TODO: > 1. when hot add a new NIC in destination side after migration finished, the NIC device > need to re-enslave on bonding device in guest. otherwise, it is offline. maybe > we should consider bonding driver to support add interfaces dynamically. I never looked at the details of how SolarFlare's code handled the guest side (they have/had their own patchset they maintained for some older version of libvirt which integrated with some sort of enhanced bonding driver on the guests). I assumed the bond driver could handle this already, but have to say I never investigated. > > This is an example on how this might work, so I want to hear some voices about this scenario. > > Thanks, > Chen > > Chen Fan (7): > qemu-agent: add agent init callback when detecting guest setup > qemu: add guest init event callback to do the initialize work for > guest > hostdev: add a 'bond' type element in <hostdev> element Putting this into <hostdev> is the wrong approach, for two reasons: 1) it doesn't account for the device to be used being in a different address on the source and destination hosts, 2) the <interface> element already has much of the config you need, and an interface type supporting hostdev passthrough. It has been possible to do passthrough of an SRIOV VF via <interface type='hostdev'> for a long time now and, even better, via an <interface type='network'> where the network pointed to contains a pool of VFs - As long as the source and destination hosts both have networks with the same name, libvirt will be able to find a currently available device on the destination as it migrates from one host to another instead of relying on both hosts having the exact same device at the exact same address on the host and destination (and also magically unused by any other guest). This page explains the use of a "hostdev network" which has a pool of devices: http://wiki.libvirt.org/page/Networking#Assignment_from_a_pool_of_SRIOV_VFs_in_a_libvirt_.3Cnetwork.3E_definition This was designed specifically with the idea in mind that one day it would be possible to migrate a domain with a hostdev device (as long as the guest could handle the hostdev device being temporarily unplugged during the migration). > qemu-agent: add qemuAgentCreateBond interface > hostdev: add parse ip and route for bond configure Again, I think that this level of detail about the guest network config belongs on the guest, not in libvirt. > migrate: hot remove hostdev at perform phase for bond device ^^ this is the useful part but I don't think the right method is to make this action dependent on the device being a "bond". I think that in this respect Shradha's patches had a better idea - any hostdev (or, by implication <interface type='hostdev'> or, much more usefully <interface type='network'> pointing to a pool of VFs - could have an attribute "ephemeral". If ephemeral was "yes", then the device would always be unplugged prior to migration and re-plugged when migration was completed (the same thing should be done when saving/restoring a domain which also can't currently be done with a domain that has a passthrough device). For that matter, this could be a general-purpose thing (although probably most useful for hostdevs) - just make it possible for *any* hotpluggable device to be "ephemeral"; the meaning of this would be that every device marked as ephemeral should be unplugged prior to migration or save (and libvirt should wait for qemu to notify that the unplug is completed), and re-plugged right after the guest is restarted. (possibly it should be implemented as an <ephemeral> *element* rather than attribute, so that options could be specified). After that is implemented and works properly, then it might be the time to think about auto-creating the bond (although again, my opinion is that this is getting a bit too intrusive into the guest (and making it more likely to fail - I know from long experience with netcf that it is all too easy for some other service on the system (ahem) to mess up all your hard work); I think it would be better to just let the guest deal with setting up a bond in its system network config, and if the bond driver can't handle having a device in the bond unplugging and plugging, then the bond driver should be enhanced). > migrate: add hostdev migrate status to support hostdev migration > > docs/schemas/basictypes.rng | 6 ++ > docs/schemas/domaincommon.rng | 37 ++++++++ > src/conf/domain_conf.c | 195 ++++++++++++++++++++++++++++++++++++++--- > src/conf/domain_conf.h | 40 +++++++-- > src/conf/networkcommon_conf.c | 17 ---- > src/conf/networkcommon_conf.h | 17 ++++ > src/libvirt_private.syms | 1 + > src/qemu/qemu_agent.c | 196 +++++++++++++++++++++++++++++++++++++++++- > src/qemu/qemu_agent.h | 12 +++ > src/qemu/qemu_command.c | 3 + > src/qemu/qemu_domain.c | 70 +++++++++++++++ > src/qemu/qemu_domain.h | 14 +++ > src/qemu/qemu_driver.c | 38 ++++++++ > src/qemu/qemu_hotplug.c | 8 +- > src/qemu/qemu_migration.c | 91 ++++++++++++++++++++ > src/qemu/qemu_migration.h | 4 + > src/qemu/qemu_process.c | 32 +++++++ > src/util/virhostdev.c | 3 + > 18 files changed, 745 insertions(+), 39 deletions(-) > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list