On Tuesday, April 23, 2024 10:45 AM, Tian, Kevin <kevin.tian@xxxxxxxxx> wrote: > To: Jason Gunthorpe <jgg@xxxxxxxxxx> > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>; Zeng, Xin > <xin.zeng@xxxxxxxxx>; herbert@xxxxxxxxxxxxxxxxxxx; yishaih@xxxxxxxxxx; > shameerali.kolothum.thodi@xxxxxxxxxx; linux-crypto@xxxxxxxxxxxxxxx; > kvm@xxxxxxxxxxxxxxx; qat-linux <qat-linux@xxxxxxxxx>; Cao, Yahui > <yahui.cao@xxxxxxxxx> > Subject: RE: [PATCH v6 1/1] vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV > VF devices > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Friday, April 19, 2024 10:11 PM > > > > On Fri, Apr 19, 2024 at 05:23:30AM +0000, Tian, Kevin wrote: > > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > > Sent: Friday, April 19, 2024 6:55 AM > > > > > > > > On Wed, 17 Apr 2024 22:31:41 +0800 > > > > Xin Zeng <xin.zeng@xxxxxxxxx> wrote: > > > > > > > > > + > > > > > + /* > > > > > + * As the device is not capable of just stopping P2P DMAs, > suspend > > > > the > > > > > + * device completely once any of the P2P states are reached. > > > > > + * On the opposite direction, resume the device after > transiting from > > > > > + * the P2P state. > > > > > + */ > > > > > + if ((cur == VFIO_DEVICE_STATE_RUNNING && new == > > > > VFIO_DEVICE_STATE_RUNNING_P2P) || > > > > > + (cur == VFIO_DEVICE_STATE_PRE_COPY && new == > > > > VFIO_DEVICE_STATE_PRE_COPY_P2P)) { > > > > > + ret = qat_vfmig_suspend(qat_vdev->mdev); > > > > > + if (ret) > > > > > + return ERR_PTR(ret); > > > > > + return NULL; > > > > > + } > > > > > > > > This doesn't appear to be a valid way to support P2P, the P2P states > > > > are defined as running states. The guest driver may legitimately > > > > access and modify the device state during P2P states. > > > > > > yes it's a conceptual violation of the definition of the P2P states. > > > > It depends what suspend actually does. > > > > Like if it halts all queues and keeps them halted, while still > > allowing queue head/tail pointer updats then it would be a fine > > implementation for P2P. > > Yes that really depends. e.g. a queue accepting direct stores (MOVDIR64B) > for work submission may have problem if that store is simply abandoned > when the queue is disabled. ENQCMD is possibly OK as unaccepted store > will get a retry indicator to software so nothing is lost. > > I'll let Xin confirm on the QAT implementation (for all device registers). > If it is like Jason's example then we should provide a clear comment > clarifying that doing suspend at RUNNING_P2P is safe for QAT as the > device MMIO interface still works according to the definition of RUNNING > and no request is lost (either from CPU or peer). There is nothing to stop > from RUNNING_P2P to STOP because the device doesn't execute any > request to further change the internal state other than MMIO. > When QAT VF is suspended, all its MMIO registers can still be operated correctly, jobs submitted through ring are queued while no jobs are processed by the device. The MMIO states can be safely migrated to the target VF during stop-copy stage and restored correctly in the target VF. All queued jobs can be resumed then. MOVDIR64B mentioned above is not supported by QAT solution, so it's fine to keep current implementation. If no objections, I'll append this paragraph of comment in the driver and post another version. Thanks, Xin > > > > > > Should this device be advertising support for P2P? > > > > > > Jason suggests all new migration drivers must support P2P state. > > > In an old discussion [1] > > > > I did? I don't think that is what the link says.. > > Not this link which I roughly remember was a follow-up to your earlier > comment. > > But I cannot find that statement so probably my memory was bad. The > closest one is: > > https://lore.kernel.org/intel-wired-lan/ZJMH3DF7nJ+OG9BJ@xxxxxxxx/#t > > but it just talks about lacking of P2P is a problem similar to you replied below: > > > > > We've been saying for a while that devices should try hard to > > implement P2P because if they don't then multi VFIO VMM's won't work > > and people will be unhappy.. > > yes but we have to admit that existing devices may not meet this > requirement. 😊