Thanks for your comments, Jason. On Tuesday, February 6, 2024 8:55 PM, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > + > > + ops = mdev->ops; > > + if (!ops || !ops->init || !ops->cleanup || > > + !ops->open || !ops->close || > > + !ops->save_state || !ops->load_state || > > + !ops->suspend || !ops->resume) { > > + ret = -EIO; > > + dev_err(&parent->dev, "Incomplete device migration ops > structure!"); > > + goto err_destroy; > > + } > > Why are there ops pointers here? I would expect this should just be > direct function calls to the PF QAT driver. I indeed had a version where the direct function calls are Implemented in QAT driver, while when I look at the functions, most of them only translate the interface to the ops pointer. That's why I put ops pointers directly into vfio variant driver. > > > +static void qat_vf_pci_aer_reset_done(struct pci_dev *pdev) > > +{ > > + struct qat_vf_core_device *qat_vdev = qat_vf_drvdata(pdev); > > + > > + if (!qat_vdev->core_device.vdev.mig_ops) > > + return; > > + > > + /* > > + * As the higher VFIO layers are holding locks across reset and using > > + * those same locks with the mm_lock we need to prevent ABBA > deadlock > > + * with the state_mutex and mm_lock. > > + * In case the state_mutex was taken already we defer the cleanup work > > + * to the unlock flow of the other running context. > > + */ > > + spin_lock(&qat_vdev->reset_lock); > > + qat_vdev->deferred_reset = true; > > + if (!mutex_trylock(&qat_vdev->state_mutex)) { > > + spin_unlock(&qat_vdev->reset_lock); > > + return; > > + } > > + spin_unlock(&qat_vdev->reset_lock); > > + qat_vf_state_mutex_unlock(qat_vdev); > > +} > > Do you really need this? I thought this ugly thing was going to be a > uniquely mlx5 thing.. I think that's still required to make the migration state synchronized if the VF is reset by other VFIO emulation paths. Is it the case? BTW, this implementation is not only in mlx5 driver, but also in other Vfio pci variant drivers such as hisilicon acc driver and pds driver. Thanks, Xin