On Tue, Apr 04, 2023 at 12:01:40PM -0700, Brett Creeley wrote: > It's possible that the device firmware crashes and is able to recover > due to some configuration and/or other issue. If a live migration > is in progress while the firmware crashes, the live migration will > fail. However, the VF PCI device should still be functional post > crash recovery and subsequent migrations should go through as > expected. > > When the pds_core device notices that firmware crashes it sends an > event to all its client drivers. When the pds_vfio driver receives > this event while migration is in progress it will request a deferred > reset on the next migration state transition. This state transition > will report failure as well as any subsequent state transition > requests from the VMM/VFIO. Based on uapi/vfio.h the only way out of > VFIO_DEVICE_STATE_ERROR is by issuing VFIO_DEVICE_RESET. Once this > reset is done, the migration state will be reset to > VFIO_DEVICE_STATE_RUNNING and migration can be performed. > > If the event is received while no migration is in progress (i.e. > the VM is in normal operating mode), then no actions are taken > and the migration state remains VFIO_DEVICE_STATE_RUNNING. > > Signed-off-by: Brett Creeley <brett.creeley@xxxxxxx> > Signed-off-by: Shannon Nelson <shannon.nelson@xxxxxxx> > --- > drivers/vfio/pci/pds/pci_drv.c | 110 +++++++++++++++++++++++++++++++- > drivers/vfio/pci/pds/vfio_dev.c | 34 +++++++++- > drivers/vfio/pci/pds/vfio_dev.h | 6 +- > 3 files changed, 146 insertions(+), 4 deletions(-) > > diff --git a/drivers/vfio/pci/pds/pci_drv.c b/drivers/vfio/pci/pds/pci_drv.c > index b0781d9f4246..b155ac9b98ae 100644 > --- a/drivers/vfio/pci/pds/pci_drv.c > +++ b/drivers/vfio/pci/pds/pci_drv.c > @@ -20,6 +20,104 @@ > #define PDS_VFIO_DRV_DESCRIPTION "AMD/Pensando VFIO Device Driver" > #define PCI_VENDOR_ID_PENSANDO 0x1dd8 > > +static void > +pds_vfio_recovery_work(struct work_struct *work) > +{ > + struct pds_vfio_pci_device *pds_vfio = > + container_of(work, struct pds_vfio_pci_device, work); > + bool deferred_reset_needed = false; > + > + /* Documentation states that the kernel migration driver must not > + * generate asynchronous device state transitions outside of > + * manipulation by the user or the VFIO_DEVICE_RESET ioctl. > + * > + * Since recovery is an asynchronous event received from the device, > + * initiate a deferred reset. Only issue the deferred reset if a > + * migration is in progress, which will cause the next step of the > + * migration to fail. Also, if the device is in a state that will > + * be set to VFIO_DEVICE_STATE_RUNNING on the next action (i.e. VM is > + * shutdown and device is in VFIO_DEVICE_STATE_STOP) as that will clear > + * the VFIO_DEVICE_STATE_ERROR when the VM starts back up. > + */ > + mutex_lock(&pds_vfio->state_mutex); > + if ((pds_vfio->state != VFIO_DEVICE_STATE_RUNNING && > + pds_vfio->state != VFIO_DEVICE_STATE_ERROR) || > + (pds_vfio->state == VFIO_DEVICE_STATE_RUNNING && > + pds_vfio_dirty_is_enabled(pds_vfio))) > + deferred_reset_needed = true; > + mutex_unlock(&pds_vfio->state_mutex); > + > + /* On the next user initiated state transition, the device will > + * transition to the VFIO_DEVICE_STATE_ERROR. At this point it's the user's > + * responsibility to reset the device. > + * > + * If a VFIO_DEVICE_RESET is requested post recovery and before the next > + * state transition, then the deferred reset state will be set to > + * VFIO_DEVICE_STATE_RUNNING. > + */ > + if (deferred_reset_needed) > + pds_vfio_deferred_reset(pds_vfio, VFIO_DEVICE_STATE_ERROR); > +} Why is this a work? it is threaded on a blocking_notifier_chain so it can call the mutex? Why is the locking like this, can't you just call pds_vfio_deferred_reset() under the mutex? Jason