On Thu, Feb 24 2022, Yishai Hadas <yishaih@xxxxxxxxxx> wrote: > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Replace the existing region based migration protocol with an ioctl based > protocol. The two protocols have the same general semantic behaviors, but > the way the data is transported is changed. > > This is the STOP_COPY portion of the new protocol, it defines the 5 states > for basic stop and copy migration and the protocol to move the migration > data in/out of the kernel. > > Compared to the clarification of the v1 protocol Alex proposed: > > https://lore.kernel.org/r/163909282574.728533.7460416142511440919.stgit@omen > > This has a few deliberate functional differences: > > - ERROR arcs allow the device function to remain unchanged. > > - The protocol is not required to return to the original state on > transition failure. Instead userspace can execute an unwind back to > the original state, reset, or do something else without needing kernel > support. This simplifies the kernel design and should userspace choose > a policy like always reset, avoids doing useless work in the kernel > on error handling paths. > > - PRE_COPY is made optional, userspace must discover it before using it. > This reflects the fact that the majority of drivers we are aware of > right now will not implement PRE_COPY. > > - segmentation is not part of the data stream protocol, the receiver > does not have to reproduce the framing boundaries. > > The hybrid FSM for the device_state is described as a Mealy machine by > documenting each of the arcs the driver is required to implement. Defining > the remaining set of old/new device_state transitions as 'combination > transitions' which are naturally defined as taking multiple FSM arcs along > the shortest path within the FSM's digraph allows a complete matrix of > transitions. > > A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is > defined to replace writing to the device_state field in the region. This > allows returning a brand new FD whenever the requested transition opens > a data transfer session. > > The VFIO core code implements the new feature and provides a helper > function to the driver. Using the helper the driver only has to > implement 6 of the FSM arcs and the other combination transitions are > elaborated consistently from those arcs. > > A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIGRATION is defined to > report the capability for migration and indicate which set of states and > arcs are supported by the device. The FSM provides a lot of flexibility to > make backwards compatible extensions but the VFIO_DEVICE_FEATURE also > allows for future breaking extensions for scenarios that cannot support > even the basic STOP_COPY requirements. > > The VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE with the GET option (i.e. > VFIO_DEVICE_FEATURE_GET) can be used to read the current migration state > of the VFIO device. > > Data transfer sessions are now carried over a file descriptor, instead of > the region. The FD functions for the lifetime of the data transfer > session. read() and write() transfer the data with normal Linux stream FD > semantics. This design allows future expansion to support poll(), > io_uring, and other performance optimizations. > > The complicated mmap mode for data transfer is discarded as current qemu > doesn't take meaningful advantage of it, and the new qemu implementation > avoids substantially all the performance penalty of using a read() on the > region. > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@xxxxxxxxxx> > Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx> > Signed-off-by: Yishai Hadas <yishaih@xxxxxxxxxx> > --- > drivers/vfio/vfio.c | 199 ++++++++++++++++++++++++++++++++++++++ > include/linux/vfio.h | 20 ++++ > include/uapi/linux/vfio.h | 174 ++++++++++++++++++++++++++++++--- > 3 files changed, 380 insertions(+), 13 deletions(-) Reviewed-by: Cornelia Huck <cohuck@xxxxxxxxxx>