On 10/11/2022 8:30 AM, Jason Gunthorpe wrote: > On Mon, Oct 10, 2022 at 04:54:50PM -0400, Steven Sistare wrote: >>> Do we have a solution to this? >>> >>> If not I would like to make a patch removing VFIO_DMA_UNMAP_FLAG_VADDR >>> >>> Aside from the approach to use the FD, another idea is to just use >>> fork. >>> >>> qemu would do something like >>> >>> .. stop all container ioctl activity .. >>> fork() >>> ioctl(CHANGE_MM) // switch all maps to this mm >>> .. signal parent.. >>> .. wait parent.. >>> exit(0) >>> .. wait child .. >>> exec() >>> ioctl(CHANGE_MM) // switch all maps to this mm >>> ..signal child.. >>> waitpid(childpid) >>> >>> This way the kernel is never left without a page provider for the >>> maps, the dummy mm_struct belonging to the fork will serve that role >>> for the gap. >>> >>> And the above is only required if we have mdevs, so we could imagine >>> userspace optimizing it away for, eg vfio-pci only cases. >>> >>> It is not as efficient as using a FD backing, but this is super easy >>> to implement in the kernel. >> >> I propose to avoid deadlock for mediated devices as follows. Currently, an >> mdev calling vfio_pin_pages blocks in vfio_wait while VFIO_DMA_UNMAP_FLAG_VADDR >> is asserted. >> >> * In vfio_wait, I will maintain a list of waiters, each list element >> consisting of (task, mdev, close_flag=false). >> >> * When the vfio device descriptor is closed, vfio_device_fops_release >> will notify the vfio_iommu driver, which will find the mdev on the waiters >> list, set elem->close_flag=true, and call wake_up_process for the task. > > This alone is not sufficient, the mdev driver can continue to > establish new mappings until it's close_device function > returns. Killing only existing mappings is racy. > > I think you are focusing on the one issue I pointed at, as I said, I'm > sure there are more ways than just close to abuse this functionality > to deadlock the kernel. > > I continue to prefer we remove it completely and do something more > robust. I suggested two options. It's not racy. New pin requests also land in vfio_wait if any vaddr's have been invalidated in any vfio_dma in the iommu. See vfio_iommu_type1_pin_pages() if (iommu->vaddr_invalid_count) vfio_find_dma_valid() vfio_wait() However, I will investigate saving a reference to the file object in the vfio_dma (for mappings backed by a file) and using that to translate IOVA's. I think that will be easier to use than fork/CHANGE_MM/exec, and may even be easier to use than VFIO_DMA_UNMAP_FLAG_VADDR. To be continued. - Steve