This series adds mlx5 live migration driver for VFs that are migration capable and includes the v2 migration protocol definition and mlx5 implementation. The mlx5 driver uses the vfio_pci_core split to create a specific VFIO PCI driver that matches the mlx5 virtual functions. The driver provides the same experience as normal vfio-pci with the addition of migration support. In HW the migration is controlled by the PF function, using its mlx5_core driver, and the VFIO PCI VF driver co-ordinates with the PF to execute the migration actions. The bulk of the v2 migration protocol is semantically the same v1, however it has been recast into a FSM for the device_state and the actual syscall interface uses normal ioctl(), read() and write() instead of building a syscall interface using the region. Several bits of infrastructure work are included here: - pci_iov_vf_id() to help drivers like mlx5 figure out the VF index from a BDF - pci_iov_get_pf_drvdata() to clarify the tricky locking protocol when a VF reaches into its PF's driver - mlx5_core uses the normal SRIOV lifecycle and disables SRIOV before driver remove, to be compatible with pci_iov_get_pf_drvdata() - Lifting VFIO_DEVICE_FEATURE into core VFIO code This series comes after alot of discussion. Some major points: - v1 ABI compatible migration defined using the same FSM approach: https://lore.kernel.org/all/0-v1-a4f7cab64938+3f-vfio_mig_states_jgg@xxxxxxxxxx/ - Attempts to clarify how the v1 API works: Alex's: https://lore.kernel.org/kvm/163909282574.728533.7460416142511440919.stgit@omen/ Jason's: https://lore.kernel.org/all/0-v3-184b374ad0a8+24c-vfio_mig_doc_jgg@xxxxxxxxxx/ - Etherpad exploring the scope and questions of general VFIO migration: https://lore.kernel.org/kvm/87mtm2loml.fsf@xxxxxxxxxx/ NOTE: As this series touched mlx5_core parts we need to send this in a pull request format to VFIO to avoid conflicts. Matching qemu changes can be previewed here: https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2 Changes from V8: https://lore.kernel.org/kvm/20220220095716.153757-1-yishaih@xxxxxxxxxx/ vfio: - Fix some documentation notes given by Alex and Cornelia for v2. - Add Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx> vfio/mlx5, net/mlx5: - Use more inclusive terminology for slave/master as was asked by Alex. Changes from V7: https://lore.kernel.org/kvm/20220207172216.206415-1-yishaih@xxxxxxxxxx/T/ vfio: - Fix and improve some documentation notes. - Improve vfio_ioctl_device_feature_migration() to check for the existence of both set and get device ops. - Improve some commit logs. - Drop the PRE_COPY patch as was asked by Alex since we have no proposed in-kernel users. - Add Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@xxxxxxxxxx>. vfio/mlx5: - Better packing struct mlx5vf_pci_core_device. net/mlx5: - Update mlx5 command list for error/debug cases. Changes from V6: https://lore.kernel.org/netdev/20220130160826.32449-1-yishaih@xxxxxxxxxx/ vfio: - Move to use the FEATURE ioctl for setting/getting the device state. - Use state_flags_table as part of vfio_mig_get_next_state() and use WARN_ON as Alex suggested. - Leave the V1 definitions in the uAPI header and drop only its documentation till V2 will be part of Linus's tree. - Fix errno's usage in few places. - Improve and adapt the uAPI documentation to match the latest code. - Put the VFIO_DEVICE_FEATURE_PCI_VF_TOKEN functionality into a separate function. - Fix some rebase note. vfio/mlx5: - Adapt to use the vfio core changes. - Fix some bad flow upon load state. Changes from V5: https://lore.kernel.org/kvm/20211027095658.144468-1-yishaih@xxxxxxxxxx/ vfio: - Migration protocol v2: + enum for device state, not bitmap + ioctl to manipulate device_state, not a region + Only STOP_COPY is mandatory, P2P and PRE_COPY are optional, discovered via VFIO_DEVICE_FEATURE + Migration data transfer is done via dedicated FD - VFIO core code to implement the migration related ioctls and help drivers implement it correctly - VFIO_DEVICE_FEATURE refactor - Delete migration protocol, drop patches fixing it - Drop "vfio/pci_core: Make the region->release() function optional" vfio/mlx5: - Switch to use migration v2 protocol, with core helpers - Eliminate the region implementation Changes from V4: https://lore.kernel.org/kvm/20211026090605.91646-1-yishaih@xxxxxxxxxx/ vfio: - Add some Reviewed-by. - Rename to vfio_pci_core_aer_err_detected() as Alex asked. vfio/mlx5: - Improve to enter the error state only if unquiesce also fails. - Fix some typos. - Use the multi-line comment style as in drivers/vfio. Changes from V3: https://lore.kernel.org/kvm/20211024083019.232813-1-yishaih@xxxxxxxxxx/ vfio/mlx5: - Align with mlx5 latest specification to create the MKEY with full read write permissions. - Fix unlock ordering in mlx5vf_state_mutex_unlock() to prevent some race. Changes from V2: https://lore.kernel.org/kvm/20211019105838.227569-1-yishaih@xxxxxxxxxx/ vfio: - Put and use the new macro VFIO_DEVICE_STATE_SET_ERROR as Alex asked. vfio/mlx5: - Improve/fix state checking as was asked by Alex & Jason. - Let things be done in a deterministic way upon 'reset_done' following the suggested algorithm by Jason. - Align with mlx5 latest specification when calling the SAVE command. - Fix some typos. vdpa/mlx5: - Drop the patch from the series based on the discussion in the mailing list. Changes from V1: https://lore.kernel.org/kvm/20211013094707.163054-1-yishaih@xxxxxxxxxx/ PCI/IOV: - Add actual interface in the subject as was asked by Bjorn and add his Acked-by. - Move to check explicitly for !dev->is_virtfn as was asked by Alex. vfio: - Come with a separate patch for fixing the non-compiled VFIO_DEVICE_STATE_SET_ERROR macro. - Expose vfio_pci_aer_err_detected() to be set by drivers on their own pci error handles. - Add a macro for VFIO_DEVICE_STATE_ERROR in the uapi header file as was suggested by Alex. vfio/mlx5: - Improve to use xor as part of checking the 'state' change command as was suggested by Alex. - Set state to VFIO_DEVICE_STATE_ERROR when an error occurred instead of VFIO_DEVICE_STATE_INVALID. - Improve state checking as was suggested by Jason. - Use its own PCI reset_done error handler as was suggested by Jason and fix the locking scheme around the state mutex to work properly. Changes from V0: https://lore.kernel.org/kvm/cover.1632305919.git.leonro@xxxxxxxxxx/ PCI/IOV: - Add an API (i.e. pci_iov_get_pf_drvdata()) that allows SRVIO VF drivers to reach the drvdata of a PF. mlx5_core: - Add an extra patch to disable SRIOV before PF removal. - Adapt to use the above PCI/IOV API as part of mlx5_vf_get_core_dev(). - Reuse the exported PCI/IOV virtfn index function call (i.e. pci_iov_vf_id(). vfio: - Add support in the pci_core to let a driver be notified when 'reset_done' to let it sets its internal state accordingly. - Add some helper stuff for 'invalid' state handling. mlx5_vfio_pci: - Move to use the 'command mode' instead of the 'state machine' scheme as was discussed in the mailing list. - Handle the RESET scenario when called by vfio_pci_core to sets its internal state accordingly. - Set initial state as RUNNING. - Put the driver files as sub-folder under drivers/vfio/pci named mlx5 and update MAINTAINER file as was asked. vdpa_mlx5: Add a new patch to use mlx5_vf_get_core_dev() to get PF device. Jason Gunthorpe (6): PCI/IOV: Add pci_iov_vf_id() to get VF index PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata of a PF vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctl vfio: Define device migration protocol v2 vfio: Extend the device migration protocol with RUNNING_P2P vfio: Remove migration protocol v1 documentation Leon Romanovsky (1): net/mlx5: Reuse exported virtfn index function call Yishai Hadas (8): net/mlx5: Disable SRIOV before PF removal net/mlx5: Expose APIs to get/put the mlx5 core device net/mlx5: Introduce migration bits and structures net/mlx5: Add migration commands definitions vfio/mlx5: Expose migration commands over mlx5 device vfio/mlx5: Implement vfio_pci driver for mlx5 devices vfio/pci: Expose vfio_pci_core_aer_err_detected() vfio/mlx5: Use its own PCI reset_done error handler MAINTAINERS | 6 + drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 10 + .../net/ethernet/mellanox/mlx5/core/main.c | 45 ++ .../ethernet/mellanox/mlx5/core/mlx5_core.h | 1 + .../net/ethernet/mellanox/mlx5/core/sriov.c | 17 +- drivers/pci/iov.c | 43 ++ drivers/vfio/pci/Kconfig | 3 + drivers/vfio/pci/Makefile | 2 + drivers/vfio/pci/mlx5/Kconfig | 10 + drivers/vfio/pci/mlx5/Makefile | 4 + drivers/vfio/pci/mlx5/cmd.c | 259 +++++++ drivers/vfio/pci/mlx5/cmd.h | 36 + drivers/vfio/pci/mlx5/main.c | 676 ++++++++++++++++++ drivers/vfio/pci/vfio_pci.c | 1 + drivers/vfio/pci/vfio_pci_core.c | 101 ++- drivers/vfio/vfio.c | 295 +++++++- include/linux/mlx5/driver.h | 3 + include/linux/mlx5/mlx5_ifc.h | 147 +++- include/linux/pci.h | 15 +- include/linux/vfio.h | 53 ++ include/linux/vfio_pci_core.h | 4 + include/uapi/linux/vfio.h | 406 +++++------ 22 files changed, 1846 insertions(+), 291 deletions(-) create mode 100644 drivers/vfio/pci/mlx5/Kconfig create mode 100644 drivers/vfio/pci/mlx5/Makefile create mode 100644 drivers/vfio/pci/mlx5/cmd.c create mode 100644 drivers/vfio/pci/mlx5/cmd.h create mode 100644 drivers/vfio/pci/mlx5/main.c -- 2.18.1