Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 9 Nov 2021 08:45:09 -0400

On Tue, Nov 09, 2021 at 12:58:26AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Monday, November 8, 2021 8:36 PM
> > 
> > On Mon, Nov 08, 2021 at 08:53:20AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > > Sent: Tuesday, October 26, 2021 11:19 PM
> > > >
> > > > On Tue, Oct 26, 2021 at 08:42:12AM -0600, Alex Williamson wrote:
> > > >
> > > > > > This is also why I don't like it being so transparent as it is
> > > > > > something userspace needs to care about - especially if the HW
> > cannot
> > > > > > support such a thing, if we intend to allow that.
> > > > >
> > > > > Userspace does need to care, but userspace's concern over this should
> > > > > not be able to compromise the platform and therefore making VF
> > > > > assignment more susceptible to fatal error conditions to comply with a
> > > > > migration uAPI is troublesome for me.
> > > >
> > > > It is an interesting scenario.
> > > >
> > > > I think it points that we are not implementing this fully properly.
> > > >
> > > > The !RUNNING state should be like your reset efforts.
> > > >
> > > > All access to the MMIO memories from userspace should be revoked
> > > > during !RUNNING
> > >
> > > This assumes that vCPUs must be stopped before !RUNNING is entered
> > > in virtualization case. and it is true today.
> > >
> > > But it may not hold when talking about guest SVA and I/O page fault [1].
> > > The problem is that the pending requests may trigger I/O page faults
> > > on guest page tables. W/o running vCPUs to handle those faults, the
> > > quiesce command cannot complete draining the pending requests
> > > if the device doesn't support preempt-on-fault (at least it's the case for
> > > some Intel and Huawei devices, possibly true for most initial SVA
> > > implementations).
> > 
> > It cannot be ordered any other way.
> > 
> > vCPUs must be stopped first, then the PCI devices must be stopped
> > after, otherwise the vCPU can touch a stopped a device while handling
> > a fault which is unreasonable.
> > 
> > However, migrating a pending IOMMU fault does seem unreasonable as well.
> > 
> > The NDA state can potentially solve this:
> > 
> >   RUNNING | VCPU RUNNING - Normal
> >   NDMA | RUNNING | VCPU RUNNING - Halt and flush DMA, and thus all
> > faults
> >   NDMA | RUNNING - Halt all MMIO access
> 
> should be two steps?
> 
> NDMA | RUNNING - vCPU stops access to the device
> NDMA - halt all MMIO access by revoking mapping

No, NDMA without running is equivalent to 0, which is the next step:

> >   0 - Halted everything

Jason