> From: Yishai Hadas <yishaih@xxxxxxxxxx> > Sent: Tuesday, February 8, 2022 1:22 AM > > +static int vfio_ioctl_device_feature_migration(struct vfio_device *device, > + u32 flags, void __user *arg, > + size_t argsz) > +{ > + struct vfio_device_feature_migration mig = { > + .flags = VFIO_MIGRATION_STOP_COPY, > + }; > + int ret; > + > + if (!device->ops->migration_set_state) > + return -ENOTTY; Miss a check on migration_get_state, as done in last function. > + * @migration_set_state: Optional callback to change the migration state for > + * devices that support migration. The returned FD is used for data > + * transfer according to the FSM definition. The driver is responsible > + * to ensure that FD is isolated whenever the migration FSM leaves a > + * data transfer state or before close_device() returns. didn't understand the meaning of 'isolated' here. > +#define VFIO_DEVICE_STATE_V1_STOP (0) > +#define VFIO_DEVICE_STATE_V1_RUNNING (1 << 0) > +#define VFIO_DEVICE_STATE_V1_SAVING (1 << 1) > +#define VFIO_DEVICE_STATE_V1_RESUMING (1 << 2) > +#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_V1_RUNNING > | \ > + VFIO_DEVICE_STATE_V1_SAVING | \ > + VFIO_DEVICE_STATE_V1_RESUMING) Does it make sense to also add 'V1' to MASK and also following macros given their names are general? #define VFIO_DEVICE_STATE_VALID(state) \ #define VFIO_DEVICE_STATE_IS_ERROR(state) \ #define VFIO_DEVICE_STATE_SET_ERROR(state) \ It certainly implies more changes to v1 code but readability can be slightly improved. > +/* > + * Indicates the device can support the migration API. See enum call it V2? Not necessary to add V2 in code but worthy of a clarification in comment. > + * vfio_device_mig_state for details. If present flags must be non-zero and > + * VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is supported. > + * > + * VFIO_MIGRATION_STOP_COPY means that RUNNING, STOP, STOP_COPY > and > + * RESUMING are supported. > + */ Not aligned with other places where 5 states are mentioned. Better add ERROR here. > + * > + * RUNNING -> STOP > + * STOP_COPY -> STOP > + * While in STOP the device must stop the operation of the device. The > + * device must not generate interrupts, DMA, or advance its internal > + * state. When stopped the device and kernel migration driver must accept > + * and respond to interaction to support external subsystems in the STOP > + * state, for example PCI MSI-X and PCI config pace. Failure by the user to > + * restrict device access while in STOP must not result in error conditions > + * outside the user context (ex. host system faults). Right above the STOP state is defined as: * STOP - The device does not change the internal or external state 'external state' I assume means P2P activities. For consistency it is clearer to also say something about external state in above paragraph. > + * > + * The STOP_COPY arc will terminate a data transfer session. remove 'will' > + * > + * STOP -> STOP_COPY > + * This arc begin the process of saving the device state and will return a > + * new data_fd. > + * > + * While in the STOP_COPY state the device has the same behavior as STOP > + * with the addition that the data transfers session continues to stream the > + * migration state. End of stream on the FD indicates the entire device > + * state has been transferred. > + * > + * The user should take steps to restrict access to vfio device regions while > + * the device is in STOP_COPY or risk corruption of the device migration > data > + * stream. Restricting access has been explained in the to-STOP arcs and it is stated that while in STOP_COPY the device has the same behavior as STOP. So I think the last paragraph is possibly not required. > + * > + * STOP -> RESUMING > + * Entering the RESUMING state starts a process of restoring the device > + * state and will return a new data_fd. The data stream fed into the > data_fd > + * should be taken from the data transfer output of the saving group states No definition of 'group state' (maybe introduced in a later patch?) > + * from a compatible device. The migration driver may alter/reset the > + * internal device state for this arc if required to prepare the device to > + * receive the migration data. > + * Thanks Kevin