> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Monday, November 8, 2021 8:36 PM > > On Mon, Nov 08, 2021 at 08:53:20AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > Sent: Tuesday, October 26, 2021 11:19 PM > > > > > > On Tue, Oct 26, 2021 at 08:42:12AM -0600, Alex Williamson wrote: > > > > > > > > This is also why I don't like it being so transparent as it is > > > > > something userspace needs to care about - especially if the HW > cannot > > > > > support such a thing, if we intend to allow that. > > > > > > > > Userspace does need to care, but userspace's concern over this should > > > > not be able to compromise the platform and therefore making VF > > > > assignment more susceptible to fatal error conditions to comply with a > > > > migration uAPI is troublesome for me. > > > > > > It is an interesting scenario. > > > > > > I think it points that we are not implementing this fully properly. > > > > > > The !RUNNING state should be like your reset efforts. > > > > > > All access to the MMIO memories from userspace should be revoked > > > during !RUNNING > > > > This assumes that vCPUs must be stopped before !RUNNING is entered > > in virtualization case. and it is true today. > > > > But it may not hold when talking about guest SVA and I/O page fault [1]. > > The problem is that the pending requests may trigger I/O page faults > > on guest page tables. W/o running vCPUs to handle those faults, the > > quiesce command cannot complete draining the pending requests > > if the device doesn't support preempt-on-fault (at least it's the case for > > some Intel and Huawei devices, possibly true for most initial SVA > > implementations). > > It cannot be ordered any other way. > > vCPUs must be stopped first, then the PCI devices must be stopped > after, otherwise the vCPU can touch a stopped a device while handling > a fault which is unreasonable. > > However, migrating a pending IOMMU fault does seem unreasonable as well. > > The NDA state can potentially solve this: > > RUNNING | VCPU RUNNING - Normal > NDMA | RUNNING | VCPU RUNNING - Halt and flush DMA, and thus all > faults > NDMA | RUNNING - Halt all MMIO access should be two steps? NDMA | RUNNING - vCPU stops access to the device NDMA - halt all MMIO access by revoking mapping > 0 - Halted everything yes, adding a new state sounds better than reordering the vcpu/device stop sequence. > > Though this may be more disruptive to the vCPUs as they could spin on > DMA/interrupts that will not come. it's inevitable regardless how we define the migration states. the actual impact depends on how long 'Halt and flush DMA' will take. Thanks Kevin