> From: Yishai Hadas <yishaih@xxxxxxxxxx> > Sent: Sunday, February 20, 2022 5:57 PM > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > The RUNNING_P2P state is designed to support multiple devices in the same > VM that are doing P2P transactions between themselves. When in > RUNNING_P2P > the device must be able to accept incoming P2P transactions but should not > generate outgoing P2P transactions. > > As an optional extension to the mandatory states it is defined as > inbetween STOP and RUNNING: > STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP > > For drivers that are unable to support RUNNING_P2P the core code > silently merges RUNNING_P2P and RUNNING together. Unless driver support > is present, the new state cannot be used in SET_STATE. > Drivers that support this will be required to implement 4 FSM arcs > beyond the basic FSM. 2 of the basic FSM arcs become combination > transitions. > > Compared to the v1 clarification, NDMA is redefined into FSM states and is > described in terms of the desired P2P quiescent behavior, noting that > halting all DMA is an acceptable implementation. > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@xxxxxxxxxx> > Signed-off-by: Yishai Hadas <yishaih@xxxxxxxxxx> Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx> > --- > drivers/vfio/vfio.c | 84 +++++++++++++++++++++++++++++++-------- > include/linux/vfio.h | 1 + > include/uapi/linux/vfio.h | 36 ++++++++++++++++- > 3 files changed, 102 insertions(+), 19 deletions(-) > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > index b37ab27b511f..bdb5205bb358 100644 > --- a/drivers/vfio/vfio.c > +++ b/drivers/vfio/vfio.c > @@ -1577,39 +1577,55 @@ int vfio_mig_get_next_state(struct vfio_device > *device, > enum vfio_device_mig_state new_fsm, > enum vfio_device_mig_state *next_fsm) > { > - enum { VFIO_DEVICE_NUM_STATES = > VFIO_DEVICE_STATE_RESUMING + 1 }; > + enum { VFIO_DEVICE_NUM_STATES = > VFIO_DEVICE_STATE_RUNNING_P2P + 1 }; > /* > - * The coding in this table requires the driver to implement 6 > + * The coding in this table requires the driver to implement > * FSM arcs: > * RESUMING -> STOP > - * RUNNING -> STOP > * STOP -> RESUMING > - * STOP -> RUNNING > * STOP -> STOP_COPY > * STOP_COPY -> STOP > * > - * The coding will step through multiple states for these combination > - * transitions: > - * RESUMING -> STOP -> RUNNING > + * If P2P is supported then the driver must also implement these FSM > + * arcs: > + * RUNNING -> RUNNING_P2P > + * RUNNING_P2P -> RUNNING > + * RUNNING_P2P -> STOP > + * STOP -> RUNNING_P2P > + * Without P2P the driver must implement: > + * RUNNING -> STOP > + * STOP -> RUNNING > + * > + * If all optional features are supported then the coding will step > + * through multiple states for these combination transitions: > + * RESUMING -> STOP -> RUNNING_P2P > + * RESUMING -> STOP -> RUNNING_P2P -> RUNNING > * RESUMING -> STOP -> STOP_COPY > - * RUNNING -> STOP -> RESUMING > - * RUNNING -> STOP -> STOP_COPY > + * RUNNING -> RUNNING_P2P -> STOP > + * RUNNING -> RUNNING_P2P -> STOP -> RESUMING > + * RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY > + * RUNNING_P2P -> STOP -> RESUMING > + * RUNNING_P2P -> STOP -> STOP_COPY > + * STOP -> RUNNING_P2P -> RUNNING > * STOP_COPY -> STOP -> RESUMING > - * STOP_COPY -> STOP -> RUNNING > + * STOP_COPY -> STOP -> RUNNING_P2P > + * STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING > */ > static const u8 > vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STA > TES] = { > [VFIO_DEVICE_STATE_STOP] = { > [VFIO_DEVICE_STATE_STOP] = > VFIO_DEVICE_STATE_STOP, > - [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_RUNNING, > + [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_RUNNING_P2P, > [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_STOP_COPY, > [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_RESUMING, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > VFIO_DEVICE_STATE_RUNNING_P2P, > [VFIO_DEVICE_STATE_ERROR] = > VFIO_DEVICE_STATE_ERROR, > }, > [VFIO_DEVICE_STATE_RUNNING] = { > - [VFIO_DEVICE_STATE_STOP] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_STOP] = > VFIO_DEVICE_STATE_RUNNING_P2P, > [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_RUNNING, > - [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_STOP, > - [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_RUNNING_P2P, > + [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_RUNNING_P2P, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > VFIO_DEVICE_STATE_RUNNING_P2P, > [VFIO_DEVICE_STATE_ERROR] = > VFIO_DEVICE_STATE_ERROR, > }, > [VFIO_DEVICE_STATE_STOP_COPY] = { > @@ -1617,6 +1633,7 @@ int vfio_mig_get_next_state(struct vfio_device > *device, > [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_STOP, > [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_STOP_COPY, > [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > VFIO_DEVICE_STATE_STOP, > [VFIO_DEVICE_STATE_ERROR] = > VFIO_DEVICE_STATE_ERROR, > }, > [VFIO_DEVICE_STATE_RESUMING] = { > @@ -1624,6 +1641,15 @@ int vfio_mig_get_next_state(struct vfio_device > *device, > [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_STOP, > [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_STOP, > [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_RESUMING, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_ERROR] = > VFIO_DEVICE_STATE_ERROR, > + }, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = { > + [VFIO_DEVICE_STATE_STOP] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_RUNNING, > + [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_STOP, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > VFIO_DEVICE_STATE_RUNNING_P2P, > [VFIO_DEVICE_STATE_ERROR] = > VFIO_DEVICE_STATE_ERROR, > }, > [VFIO_DEVICE_STATE_ERROR] = { > @@ -1631,17 +1657,41 @@ int vfio_mig_get_next_state(struct vfio_device > *device, > [VFIO_DEVICE_STATE_RUNNING] = > VFIO_DEVICE_STATE_ERROR, > [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_DEVICE_STATE_ERROR, > [VFIO_DEVICE_STATE_RESUMING] = > VFIO_DEVICE_STATE_ERROR, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > VFIO_DEVICE_STATE_ERROR, > [VFIO_DEVICE_STATE_ERROR] = > VFIO_DEVICE_STATE_ERROR, > }, > }; > > - if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table))) > + static const unsigned int > state_flags_table[VFIO_DEVICE_NUM_STATES] = { > + [VFIO_DEVICE_STATE_STOP] = > VFIO_MIGRATION_STOP_COPY, > + [VFIO_DEVICE_STATE_RUNNING] = > VFIO_MIGRATION_STOP_COPY, > + [VFIO_DEVICE_STATE_STOP_COPY] = > VFIO_MIGRATION_STOP_COPY, > + [VFIO_DEVICE_STATE_RESUMING] = > VFIO_MIGRATION_STOP_COPY, > + [VFIO_DEVICE_STATE_RUNNING_P2P] = > + VFIO_MIGRATION_STOP_COPY | > VFIO_MIGRATION_P2P, > + [VFIO_DEVICE_STATE_ERROR] = ~0U, > + }; > + > + if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table) || > + (state_flags_table[cur_fsm] & device->migration_flags) != > + state_flags_table[cur_fsm])) > return -EINVAL; > > - if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table)) > + if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table) || > + (state_flags_table[new_fsm] & device->migration_flags) != > + state_flags_table[new_fsm]) > return -EINVAL; > > + /* > + * Arcs touching optional and unsupported states are skipped over. > The > + * driver will instead see an arc from the original state to the next > + * logical state, as per the above comment. > + */ > *next_fsm = vfio_from_fsm_table[cur_fsm][new_fsm]; > + while ((state_flags_table[*next_fsm] & device->migration_flags) != > + state_flags_table[*next_fsm]) > + *next_fsm = vfio_from_fsm_table[*next_fsm][new_fsm]; > + > return (*next_fsm != VFIO_DEVICE_STATE_ERROR) ? 0 : -EINVAL; > } > EXPORT_SYMBOL_GPL(vfio_mig_get_next_state); > @@ -1731,7 +1781,7 @@ static int > vfio_ioctl_device_feature_migration(struct vfio_device *device, > size_t argsz) > { > struct vfio_device_feature_migration mig = { > - .flags = VFIO_MIGRATION_STOP_COPY, > + .flags = device->migration_flags, > }; > int ret; > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > index 3bbadcdbc9c8..3176cb5d4464 100644 > --- a/include/linux/vfio.h > +++ b/include/linux/vfio.h > @@ -33,6 +33,7 @@ struct vfio_device { > struct vfio_group *group; > struct vfio_device_set *dev_set; > struct list_head dev_set_list; > + unsigned int migration_flags; > > /* Members below here are private, not for driver use */ > refcount_t refcount; > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 02b836ea8f46..46b06946f0a8 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -1010,10 +1010,16 @@ struct vfio_device_feature { > * > * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and > * RESUMING are supported. > + * > + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that > RUNNING_P2P > + * is supported in addition to the STOP_COPY states. > + * > + * Other combinations of flags have behavior to be defined in the future. > */ > struct vfio_device_feature_migration { > __aligned_u64 flags; > #define VFIO_MIGRATION_STOP_COPY (1 << 0) > +#define VFIO_MIGRATION_P2P (1 << 1) > }; > #define VFIO_DEVICE_FEATURE_MIGRATION 1 > > @@ -1064,10 +1070,13 @@ struct vfio_device_feature_mig_state { > * RESUMING - The device is stopped and is loading a new internal state > * ERROR - The device has failed and must be reset > * > + * And 1 optional state to support VFIO_MIGRATION_P2P: > + * RUNNING_P2P - RUNNING, except the device cannot do peer to peer > DMA > + * > * The FSM takes actions on the arcs between FSM states. The driver > implements > * the following behavior for the FSM arcs: > * > - * RUNNING -> STOP > + * RUNNING_P2P -> STOP > * STOP_COPY -> STOP > * While in STOP the device must stop the operation of the device. The > device > * must not generate interrupts, DMA, or any other change to external state. > @@ -1094,11 +1103,16 @@ struct vfio_device_feature_mig_state { > * > * To abort a RESUMING session the device must be reset. > * > - * STOP -> RUNNING > + * RUNNING_P2P -> RUNNING > * While in RUNNING the device is fully operational, the device may > generate > * interrupts, DMA, respond to MMIO, all vfio device regions are functional, > * and the device may advance its internal state. > * > + * RUNNING -> RUNNING_P2P > + * STOP -> RUNNING_P2P > + * While in RUNNING_P2P the device is partially running in the P2P > quiescent > + * state defined below. > + * > * STOP -> STOP_COPY > * This arc begin the process of saving the device state and will return a > * new data_fd. > @@ -1128,6 +1142,18 @@ struct vfio_device_feature_mig_state { > * To recover from ERROR VFIO_DEVICE_RESET must be used to return the > * device_state back to RUNNING. > * > + * The optional peer to peer (P2P) quiescent state is intended to be a > quiescent > + * state for the device for the purposes of managing multiple devices within > a > + * user context where peer-to-peer DMA between devices may be active. > The > + * RUNNING_P2P states must prevent the device from initiating > + * any new P2P DMA transactions. If the device can identify P2P transactions > + * then it can stop only P2P DMA, otherwise it must stop all DMA. The > migration > + * driver must complete any such outstanding operations prior to > completing the > + * FSM arc into a P2P state. For the purpose of specification the states > + * behave as though the device was fully running if not supported. Like > while in > + * STOP or STOP_COPY the user must not touch the device, otherwise the > state > + * can be exited. > + * > * The remaining possible transitions are interpreted as combinations of the > * above FSM arcs. As there are multiple paths through the FSM arcs the > path > * should be selected based on the following rules: > @@ -1140,6 +1166,11 @@ struct vfio_device_feature_mig_state { > * fails. When handling these types of errors users should anticipate future > * revisions of this protocol using new states and those states becoming > * visible in this case. > + * > + * The optional states cannot be used with SET_STATE if the device does not > + * support them. The user can discover if these states are supported by > using > + * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions > the user can > + * avoid knowing about these optional states if the kernel driver supports > them. > */ > enum vfio_device_mig_state { > VFIO_DEVICE_STATE_ERROR = 0, > @@ -1147,6 +1178,7 @@ enum vfio_device_mig_state { > VFIO_DEVICE_STATE_RUNNING = 2, > VFIO_DEVICE_STATE_STOP_COPY = 3, > VFIO_DEVICE_STATE_RESUMING = 4, > + VFIO_DEVICE_STATE_RUNNING_P2P = 5, > }; > > /* -------- API for Type1 VFIO IOMMU -------- */ > -- > 2.18.1