Re: [PATCH RFC] vfio: Revise and update the migration uAPI description

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 18, 2022 at 05:00:48PM -0400, Jason Gunthorpe wrote:
> > Core code "transitioning" the device to ERROR seems a little suspect
> > here.  I guess you're imagining that driver code calling this with an
> > pointer to internal state that it also tests on a non-zero return.
> 
> Unfortunately, ideally the migration state would be stored in struct
> vfio_device, and ideally the way to transition states would be a core
> ioctl, not a region thingy.
> 
> If it was an ioctl then I'd return a 'needs reset' and exact current
> device state.
> 
> > Should we just step-device-state to ERROR to directly inform the driver?
> 
> That is certainly a valid choice, it may eliminate the very ugly
> pointer argument too. I will try it out.

It looks more poor unfortunately.

The pointer has the second purpose of allowing the core code to know
when the driver goes to ERROR if it returned -errno. If we get rid of
this then the core code's idea of device state becomes desync'd with
the driver's version and it will start doing nonsense things like
invoking cur_state = ERROR. Currently the core code protects the
driver from seeing those kinds of arcs.

Second, the pointer is consolidating the code to update new state only
upon success of the driver's function - without this we have to open
code this in every driver, it increases the LOC of the mlx5
migration_step_device_state() by about 30% - though it is not
complicated new code.

(realy the whole pointer is some hacky approach to avoid putting the
device_state enum in struct vfio_device, and maybe we should just do
that in the first place)

Since the scheme has the core code update the current state's storage,
and I don't really want to undo that, adding a call to ERROR is just
dead core code at this point as mlx5 doesn't do anything with it.

This is still a reasonable idea, but lets do it when a driver comes
along to implement something for the ERROR arc. It can include this:

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index c02e057b87cd3c..913bf02946b832 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1800,6 +1800,7 @@ int vfio_mig_set_device_state(struct vfio_device *device, u32 target_state,
 	     !vfio_mig_in_saving_group(*cur_state)) ||
 	    (starting_state == VFIO_DEVICE_STATE_RESUMING &&
 	     *cur_state != VFIO_DEVICE_STATE_RESUMING)) {
+		device->ops->migration_step_device_state(device, VFIO_DEVICE_STATE_ERROR);
 		*cur_state = VFIO_DEVICE_STATE_ERROR;
 		return ret;
 	}
@@ -1813,7 +1814,11 @@ int vfio_mig_set_device_state(struct vfio_device *device, u32 target_state,
 		if (next_state == VFIO_DEVICE_STATE_ERROR ||
 		    device->ops->migration_step_device_state(device,
 							     next_state)) {
-			*cur_state = VFIO_DEVICE_STATE_ERROR;
+			if (*cur_state != VFIO_DEVICE_STATE_ERROR) {
+				device->ops->migration_step_device_state(
+					device, VFIO_DEVICE_STATE_ERROR);
+				*cur_state = VFIO_DEVICE_STATE_ERROR;
+			}
 			break;
 		}
 		*cur_state = next_state;

I have a feeling this might make sense for mdev based drivers that
implement a SW reset.

Jason



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux