On Fri, Oct 07, 2022 at 10:37:11AM -0400, Matthew Rosato wrote: > On 10/7/22 9:37 AM, Jason Gunthorpe wrote: > > On Thu, Oct 06, 2022 at 07:28:53PM -0400, Matthew Rosato wrote: > > > >>> Oh, I'm surprised the s390 testing didn't hit this!! > >> > >> Huh, me too, at least eventually - I think it's because we aren't > >> pinning everything upfront but rather on-demand so the missing the > >> type1 release / vfio_iommu_unmap_unpin_all wouldn't be so obvious. > >> I definitely did multiple VM (re)starts and hot (un)plugs. But > >> while my test workloads did some I/O, the long-running one was > >> focused on the plug/unplug scenarios to recreate the initial issue > >> so the I/O (and thus pinning) done would have been minimal. > > > > That explains ccw/ap a bit but for PCI the iommu ownership wasn't > > released so it becomes impossible to re-attach a container to the > > group. eg a 2nd VM can never be started > > > > Ah well, thanks! > > > > Jason > > Well, this bugged me enough that I traced the v1 series without fixup and vfio-pci on s390 was OK because it was still calling detach_container on vm shutdown via this chain: > > vfio_pci_remove > vfio_pci_core_unregister_device > vfio_unregister_group_dev > vfio_device_remove_group > vfio_group_detach_container > > I'd guess non-s390 vfio-pci would do the same. Alex also had the mtty mdev, maybe that's relevant. As long as you are unplugging a driver the v1 series would work. The failure mode is when you don't unplug the driver and just run a VM twice in a row. Jason