On Tue, 6 Dec 2022 10:34:24 +0200 Yishai Hadas <yishaih@xxxxxxxxxx> wrote: > This series adds migration PRE_COPY uAPIs and their implementation as part of > mlx5 driver. > > The uAPIs follow some discussion that was done in the mailing list [1] in this > area. > > By the time the patches were sent, there was no driver implementation for the > uAPIs, now we have it for mlx5 driver. > > The optional PRE_COPY state opens the saving data transfer FD before reaching > STOP_COPY and allows the device to dirty track the internal state changes with > the general idea to reduce the volume of data transferred in the STOP_COPY > stage. > > While in PRE_COPY the device remains RUNNING, but the saving FD is open. > > A new ioctl VFIO_MIG_GET_PRECOPY_INFO is provided to allow userspace to query > the progress of the precopy operation in the driver with the idea it will judge > to move to STOP_COPY once the initial data set is transferred, and possibly > after the dirty size has shrunk appropriately. > > User space can detect whether PRE_COPY is supported for a given device by > checking the VFIO_MIGRATION_PRE_COPY flag once using the > VFIO_DEVICE_FEATURE_MIGRATION ioctl. > > Extra details exist as part of the specific uAPI patch from the series. > > Finally, we come with mlx5 implementation based on its device specification for > PRE_COPY. > > To support PRE_COPY, mlx5 driver is transferring multiple states (images) of > the device. e.g.: the source VF can save and transfer multiple states, and the > target VF will load them by that order. > > The device is saving three kinds of states: > 1) Initial state - when the device moves to PRE_COPY state. > 2) Middle state - during PRE_COPY phase via VFIO_MIG_GET_PRECOPY_INFO, > can be multiple such states. > 3) Final state - when the device moves to STOP_COPY state. > > After moving to PRE_COPY state, the user is holding the saving FD and should > use it for transferring the data from the source to the target while the VM is > still running. From user point of view, it's a stream of data, however, from > mlx5 driver point of view it includes multiple images/states. For that, it sets > some headers with metadata on the source to be parsed on the target. > > At some point, user may switch the device state from PRE_COPY to STOP_COPY, > this will invoke saving of the final state. > > As discussed earlier in the mailing list, the data that is returned as part of > PRE_COPY is not required to have any bearing relative to the data size > available during the STOP_COPY phase. > > For this, we have the VFIO_DEVICE_FEATURE_MIG_DATA_SIZE option. > > In mlx5 driver we could gain with this series about 20-30 percent improvement > in the downtime compared to the previous code when PRE_COPY wasn't supported. > > The series includes some pre-patches to be ready for managing multiple images > then it comes with the PRE_COPY implementation itself. > > The matching qemu changes can be previewed here [2]. > > They come on top of the v2 migration protocol patches that were sent already to > the mailing list. > > [1] https://lore.kernel.org/kvm/20220302172903.1995-8-shameerali.kolothum.thodi@xxxxxxxxxx/ > [2] https://github.com/avihai1122/qemu/commits/mig_v2_precopy > > Changes from V3: https://www.spinics.net/lists/kvm/msg297449.html > Patch #1: > - Add Acked-by: Leon Romanovsky. > Patch #10: > - Fix mlx5vf_precopy_ioctl() signature to return long instead of ssize_t > as Alex pointed out. > > Changes from V2: https://www.spinics.net/lists/kvm/msg297112.html > > Patch #2: > - Add a note that the VFIO_MIG_GET_PRECOPY_INFO ioctl is mandatory when > a driver claims to support VFIO_MIGRATION_PRE_COPY as was raised by > Shameer Kolothum. > - Add Reviewed-by: Shameer Kolothum and Kevin Tian. > Patch #3: > - Add a comment in the code as suggested by Jason. > All: > - Add Reviewed-by: Jason Gunthorpe for the series. > > Note: > As pointed out by Leon in the mailing list, no need for a PR for the > first patch of net/mlx5. > > Changes from V1: https://www.spinics.net/lists/kvm/msg296475.html > > Patch #2: Rephrase the 'initial_bytes' meaning as was suggested by Jason. > Patch #9: Fix to send header based on PRE_COPY support. > Patch #13: Fix some unwind flow to call complete(). > > Changes from V0: https://www.spinics.net/lists/kvm/msg294247.html > > Drop the first 2 patches that Alex merged already. > Refactor mlx5 implementation based on Jason's comments on V0, it includes > the below: > * Refactor the PD usage to be aligned with the migration file life cycle. > * Refactor the MKEY usage to be aligned with the migration file life cycle. > * Refactor the migration file state. > * Use queue based data chunks to simplify the driver code. > * Use the FSM model on the target to simplify the driver code. > * Extend the driver pre_copy header for future use. > > Yishai > > Jason Gunthorpe (1): > vfio: Extend the device migration protocol with PRE_COPY > > Shay Drory (3): > net/mlx5: Introduce ifc bits for pre_copy > vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error > vfio/mlx5: Enable MIGRATION_PRE_COPY flag > > Yishai Hadas (10): > vfio/mlx5: Enforce a single SAVE command at a time > vfio/mlx5: Refactor PD usage > vfio/mlx5: Refactor MKEY usage > vfio/mlx5: Refactor migration file state > vfio/mlx5: Refactor to use queue based data chunks > vfio/mlx5: Introduce device transitions of PRE_COPY > vfio/mlx5: Introduce SW headers for migration states > vfio/mlx5: Introduce vfio precopy ioctl implementation > vfio/mlx5: Consider temporary end of stream as part of PRE_COPY > vfio/mlx5: Introduce multiple loads > > drivers/vfio/pci/mlx5/cmd.c | 409 ++++++++++++++---- > drivers/vfio/pci/mlx5/cmd.h | 96 ++++- > drivers/vfio/pci/mlx5/main.c | 752 ++++++++++++++++++++++++++++------ > drivers/vfio/vfio_main.c | 74 +++- > include/linux/mlx5/mlx5_ifc.h | 14 +- > include/uapi/linux/vfio.h | 123 +++++- > 6 files changed, 1248 insertions(+), 220 deletions(-) Applied to vfio next branch for v6.2. Thanks, Alex