Re: [PATCH V7 mlx5-next 00/15] Add mlx5 live migration driver and v2 migration protocol

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2/7/2022 10:52 PM, Yishai Hadas wrote:
External email: Use caution opening links or attachments


This series adds mlx5 live migration driver for VFs that are migration
capable and includes the v2 migration protocol definition and mlx5
implementation.

The mlx5 driver uses the vfio_pci_core split to create a specific VFIO
PCI driver that matches the mlx5 virtual functions. The driver provides
the same experience as normal vfio-pci with the addition of migration
support.

In HW the migration is controlled by the PF function, using its
mlx5_core driver, and the VFIO PCI VF driver co-ordinates with the PF to
execute the migration actions.

The bulk of the v2 migration protocol is semantically the same v1,
however it has been recast into a FSM for the device_state and the
actual syscall interface uses normal ioctl(), read() and write() instead
of building a syscall interface using the region.

Several bits of infrastructure work are included here:
  - pci_iov_vf_id() to help drivers like mlx5 figure out the VF index from
    a BDF
  - pci_iov_get_pf_drvdata() to clarify the tricky locking protocol when a
    VF reaches into its PF's driver
  - mlx5_core uses the normal SRIOV lifecycle and disables SRIOV before
    driver remove, to be compatible with pci_iov_get_pf_drvdata()
  - Lifting VFIO_DEVICE_FEATURE into core VFIO code

This series comes after alot of discussion. Some major points:
- v1 ABI compatible migration defined using the same FSM approach:
    https://lore.kernel.org/all/0-v1-a4f7cab64938+3f-vfio_mig_states_jgg@xxxxxxxxxx/
- Attempts to clarify how the v1 API works:
    Alex's:
      https://lore.kernel.org/kvm/163909282574.728533.7460416142511440919.stgit@omen/
    Jason's:
      https://lore.kernel.org/all/0-v3-184b374ad0a8+24c-vfio_mig_doc_jgg@xxxxxxxxxx/
- Etherpad exploring the scope and questions of general VFIO migration:
      https://lore.kernel.org/kvm/87mtm2loml.fsf@xxxxxxxxxx/

NOTE: As this series touched mlx5_core parts we need to send this in a
pull request format to VFIO to avoid conflicts.

Matching qemu changes can be previewed here:
  https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2

Changes from V6: https://lore.kernel.org/netdev/20220130160826.32449-1-yishaih@xxxxxxxxxx/
vfio:
- Move to use the FEATURE ioctl for setting/getting the device state.
- Use state_flags_table as part of vfio_mig_get_next_state() and use
   WARN_ON as Alex suggested.
- Leave the V1 definitions in the uAPI header and drop only its
   documentation till V2 will be part of Linus's tree.
- Fix errno's usage in few places.
- Improve and adapt the uAPI documentation to match the latest code.
- Put the VFIO_DEVICE_FEATURE_PCI_VF_TOKEN functionality into a separate
   function.
- Fix some rebase note.
vfio/mlx5:
- Adapt to use the vfio core changes.
- Fix some bad flow upon load state.

Changes from V5: https://lore.kernel.org/kvm/20211027095658.144468-1-yishaih@xxxxxxxxxx/
vfio:
- Migration protocol v2:
   + enum for device state, not bitmap
   + ioctl to manipulate device_state, not a region
   + Only STOP_COPY is mandatory, P2P and PRE_COPY are optional, discovered
     via VFIO_DEVICE_FEATURE
   + Migration data transfer is done via dedicated FD
- VFIO core code to implement the migration related ioctls and help
   drivers implement it correctly
- VFIO_DEVICE_FEATURE refactor
- Delete migration protocol, drop patches fixing it
- Drop "vfio/pci_core: Make the region->release() function optional"
vfio/mlx5:
- Switch to use migration v2 protocol, with core helpers
- Eliminate the region implementation

Changes from V4: https://lore.kernel.org/kvm/20211026090605.91646-1-yishaih@xxxxxxxxxx/
vfio:
- Add some Reviewed-by.
- Rename to vfio_pci_core_aer_err_detected() as Alex asked.
vfio/mlx5:
- Improve to enter the error state only if unquiesce also fails.
- Fix some typos.
- Use the multi-line comment style as in drivers/vfio.

Changes from V3: https://lore.kernel.org/kvm/20211024083019.232813-1-yishaih@xxxxxxxxxx/
vfio/mlx5:
- Align with mlx5 latest specification to create the MKEY with full read
   write permissions.
- Fix unlock ordering in mlx5vf_state_mutex_unlock() to prevent some
   race.

Changes from V2: https://lore.kernel.org/kvm/20211019105838.227569-1-yishaih@xxxxxxxxxx/
vfio:
- Put and use the new macro VFIO_DEVICE_STATE_SET_ERROR as Alex asked.
vfio/mlx5:
- Improve/fix state checking as was asked by Alex & Jason.
- Let things be done in a deterministic way upon 'reset_done' following
   the suggested algorithm by Jason.
- Align with mlx5 latest specification when calling the SAVE command.
- Fix some typos.
vdpa/mlx5:
- Drop the patch from the series based on the discussion in the mailing
   list.

Changes from V1: https://lore.kernel.org/kvm/20211013094707.163054-1-yishaih@xxxxxxxxxx/
PCI/IOV:
- Add actual interface in the subject as was asked by Bjorn and add
   his Acked-by.
- Move to check explicitly for !dev->is_virtfn as was asked by Alex.
vfio:
- Come with a separate patch for fixing the non-compiled
   VFIO_DEVICE_STATE_SET_ERROR macro.
- Expose vfio_pci_aer_err_detected() to be set by drivers on their own
   pci error handles.
- Add a macro for VFIO_DEVICE_STATE_ERROR in the uapi header file as was
   suggested by Alex.
vfio/mlx5:
- Improve to use xor as part of checking the 'state' change command as
   was suggested by Alex.
- Set state to VFIO_DEVICE_STATE_ERROR when an error occurred instead of
   VFIO_DEVICE_STATE_INVALID.
- Improve state checking as was suggested by Jason.
- Use its own PCI reset_done error handler as was suggested by Jason and
   fix the locking scheme around the state mutex to work properly.

Changes from V0: https://lore.kernel.org/kvm/cover.1632305919.git.leonro@xxxxxxxxxx/
PCI/IOV:
- Add an API (i.e. pci_iov_get_pf_drvdata()) that allows SRVIO VF drivers
   to reach the drvdata of a PF.
mlx5_core:
- Add an extra patch to disable SRIOV before PF removal.
- Adapt to use the above PCI/IOV API as part of mlx5_vf_get_core_dev().
- Reuse the exported PCI/IOV virtfn index function call (i.e. pci_iov_vf_id().
vfio:
- Add support in the pci_core to let a driver be notified when
  'reset_done' to let it sets its internal state accordingly.
- Add some helper stuff for 'invalid' state handling.
mlx5_vfio_pci:
- Move to use the 'command mode' instead of the 'state machine'
  scheme as was discussed in the mailing list.
- Handle the RESET scenario when called by vfio_pci_core to sets
  its internal state accordingly.
- Set initial state as RUNNING.
- Put the driver files as sub-folder under drivers/vfio/pci named mlx5
   and update MAINTAINER file as was asked.
vdpa_mlx5:
Add a new patch to use mlx5_vf_get_core_dev() to get PF device.
Jason Gunthorpe (7):
   PCI/IOV: Add pci_iov_vf_id() to get VF index
   PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata
     of a PF
   vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctl
   vfio: Define device migration protocol v2
   vfio: Extend the device migration protocol with RUNNING_P2P
   vfio: Remove migration protocol v1 documentation
   vfio: Extend the device migration protocol with PRE_COPY

Leon Romanovsky (1):
   net/mlx5: Reuse exported virtfn index function call

Yishai Hadas (7):
   net/mlx5: Disable SRIOV before PF removal
   net/mlx5: Expose APIs to get/put the mlx5 core device
   net/mlx5: Introduce migration bits and structures
   vfio/mlx5: Expose migration commands over mlx5 device
   vfio/mlx5: Implement vfio_pci driver for mlx5 devices
   vfio/pci: Expose vfio_pci_core_aer_err_detected()
   vfio/mlx5: Use its own PCI reset_done error handler

  MAINTAINERS                                   |   6 +
  .../net/ethernet/mellanox/mlx5/core/main.c    |  45 ++
  .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   1 +
  .../net/ethernet/mellanox/mlx5/core/sriov.c   |  17 +-
  drivers/pci/iov.c                             |  43 ++
  drivers/vfio/pci/Kconfig                      |   3 +
  drivers/vfio/pci/Makefile                     |   2 +
  drivers/vfio/pci/mlx5/Kconfig                 |  10 +
  drivers/vfio/pci/mlx5/Makefile                |   4 +
  drivers/vfio/pci/mlx5/cmd.c                   | 259 +++++++
  drivers/vfio/pci/mlx5/cmd.h                   |  36 +
  drivers/vfio/pci/mlx5/main.c                  | 676 ++++++++++++++++++
  drivers/vfio/pci/vfio_pci.c                   |   1 +
  drivers/vfio/pci/vfio_pci_core.c              | 101 ++-
  drivers/vfio/vfio.c                           | 358 +++++++++-
  include/linux/mlx5/driver.h                   |   3 +
  include/linux/mlx5/mlx5_ifc.h                 | 147 +++-
  include/linux/pci.h                           |  15 +-
  include/linux/vfio.h                          |  50 ++
  include/linux/vfio_pci_core.h                 |   4 +
  include/uapi/linux/vfio.h                     | 504 +++++++------
  21 files changed, 1994 insertions(+), 291 deletions(-)
  create mode 100644 drivers/vfio/pci/mlx5/Kconfig
  create mode 100644 drivers/vfio/pci/mlx5/Makefile
  create mode 100644 drivers/vfio/pci/mlx5/cmd.c
  create mode 100644 drivers/vfio/pci/mlx5/cmd.h
  create mode 100644 drivers/vfio/pci/mlx5/main.c

--
2.18.1


We've tested Nvidia vGPU live migration functionality with the current v7 proposal and functionally, it works fine. We're thinking of further performance optimizations to migrate large amounts of the data, will propose it later on after working out the details.

Thanks,
Tarun



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux