On Mon, Oct 10, 2022 at 04:54:50PM -0400, Steven Sistare wrote: > > Do we have a solution to this? > > > > If not I would like to make a patch removing VFIO_DMA_UNMAP_FLAG_VADDR > > > > Aside from the approach to use the FD, another idea is to just use > > fork. > > > > qemu would do something like > > > > .. stop all container ioctl activity .. > > fork() > > ioctl(CHANGE_MM) // switch all maps to this mm > > .. signal parent.. > > .. wait parent.. > > exit(0) > > .. wait child .. > > exec() > > ioctl(CHANGE_MM) // switch all maps to this mm > > ..signal child.. > > waitpid(childpid) > > > > This way the kernel is never left without a page provider for the > > maps, the dummy mm_struct belonging to the fork will serve that role > > for the gap. > > > > And the above is only required if we have mdevs, so we could imagine > > userspace optimizing it away for, eg vfio-pci only cases. > > > > It is not as efficient as using a FD backing, but this is super easy > > to implement in the kernel. > > I propose to avoid deadlock for mediated devices as follows. Currently, an > mdev calling vfio_pin_pages blocks in vfio_wait while VFIO_DMA_UNMAP_FLAG_VADDR > is asserted. > > * In vfio_wait, I will maintain a list of waiters, each list element > consisting of (task, mdev, close_flag=false). > > * When the vfio device descriptor is closed, vfio_device_fops_release > will notify the vfio_iommu driver, which will find the mdev on the waiters > list, set elem->close_flag=true, and call wake_up_process for the task. This alone is not sufficient, the mdev driver can continue to establish new mappings until it's close_device function returns. Killing only existing mappings is racy. I think you are focusing on the one issue I pointed at, as I said, I'm sure there are more ways than just close to abuse this functionality to deadlock the kernel. I continue to prefer we remove it completely and do something more robust. I suggested two options. Jason