Re: [RFC PATCH 0/3] Qemu/IXGBE: Add live migration support for SRIOV NIC

Alex Williamson <alex.williamson@xxxxxxxxxx> · Wed, 21 Oct 2015 12:39:21 -0600

On Thu, 2015-10-22 at 00:52 +0800, Lan Tianyu wrote:
> This patchset is Qemu part for live migration support for SRIOV NIC.
> kernel part patch information is in the following link.
> http://marc.info/?l=kvm&m=144544635330193&w=2
> 
> 
> Lan Tianyu (3):
>   Qemu: Add pci-assign.h to share functions and struct definition with
>     new file
>   Qemu: Add post_load_state() to run after restoring CPU state
>   Qemu: Introduce pci-sriov device type to support VF live migration
> 
>  hw/i386/kvm/Makefile.objs   |   2 +-
>  hw/i386/kvm/pci-assign.c    | 113 +----------------------
>  hw/i386/kvm/pci-assign.h    | 109 +++++++++++++++++++++++
>  hw/i386/kvm/sriov.c         | 213 ++++++++++++++++++++++++++++++++++++++++++++
>  include/migration/vmstate.h |   2 +
>  migration/savevm.c          |  15 ++++
>  6 files changed, 344 insertions(+), 110 deletions(-)
>  create mode 100644 hw/i386/kvm/pci-assign.h
>  create mode 100644 hw/i386/kvm/sriov.c
> 

Hi Lan,

Seems like there are a couple immediate problems with this approach.
The first is that you're modifying legacy KVM device assignment, which
is deprecated upstream and not even enabled by some distros.  VFIO is
the supported mechanism for doing PCI device assignment now and any
features like this need to be added there first.  It's not only more
secure than legacy KVM device assignment, but it also doesn't limit this
to an x86-only solution.  Surely you want to support 82599 VF migration
on other platforms as well.

Using sysfs to interact with the PF is also problematic since that means
that libvirt needs to grant qemu access to these files, adding one more
layer to the stack.  If we were to use VFIO, we could potentially enable
this through a save-state region on the device file descriptor and if
necessary, virtual interrupt channels for the device as well.  This of
course implies that the kernel internal channels are made as general as
possible in order to support any PF driver.

That said, there are some nice features here.  Using unused PCI config
bytes to communicate with the guest driver and enable guest-based page
dirtying is a nice hack.  However, if we want to add this capability to
other devices, we're not always going to be able to use fixed addresses
0xf0 and 0xf1.  I would suggest that we probably want to create a
virtual capability in the config space of the VF, perhaps a Vendor
Specific capability.  Obviously some devices won't have room for a full
capability in the standard config space, so we may need to optionally
expose it in extended config space.  Those device would be limited to
only supporting migration in PCI-e configurations in the guest.  Also,
plenty of devices make use of undefined PCI config space, so we may not
be able to simply add a capability to a region we think is unused, maybe
it needs to happen through reserved space in another capability or
perhaps defining a virtual BAR that unenlightened guest drivers would
ignore.  The point is that we somehow need to standardize that so that
rather than implicitly know that it's at 0xf0/0xf1 on 82599 VFs.

Also, I haven't looked at the kernel-side patches yet, but the saved
state received from and loaded into the PF driver needs to be versioned
and maybe we need some way to know whether versions are compatible.
Migration version information is difficult enough for QEMU, it's a
completely foreign concept in the kernel.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html