For cpr, old qemu directly exec's new qemu, so task does not change. To support fork+exec, the ownership test needs to be deleted or modified. Pinned page accounting is another issue, as the parent counts pins in its mm->locked_vm. If the child unmaps, it cannot simply decrement its own mm->locked_vm counter. As you and I have discussed, the count is also wrong in the direct exec model, because exec clears mm->locked_vm. I am thinking vfio could count pins in struct user locked_vm to handle both models. The user struct and its count would persist across direct exec, and be shared by parent and child for fork+exec. However, that does change the RLIMIT_MEMLOCK value that applications must set, because the limit must accommodate vfio plus other sub-systems that count in user->locked_vm, which includes io_uring, skbuff, xdp, and perf. Plus, the limit must accommodate all processes of that user, not just a single process. Folks like fork+exec because it allows recovery if the new qemu process fails to initialize. One can fall back to the original process, if the above issues are fixed. - Steve On 6/27/2022 6:06 PM, Alex Williamson wrote: > > Hey Steve, how did you get around this for cpr or is this a gap? > Thanks, > > Alex > > On Mon, 27 Jun 2022 11:51:09 +0800 > lizhe.67@xxxxxxxxxxxxx wrote: > >> From: Li Zhe <lizhe.67@xxxxxxxxxxxxx> >> >> In function vfio_dma_do_unmap(), we currently prevent process to unmap >> vfio dma region whose mm_struct is different from the vfio_dma->task. >> In our virtual machine scenario which is using kvm and qemu, this >> judgement stops us from liveupgrading our qemu, which uses fork() && >> exec() to load the new binary but the new process cannot do the >> VFIO_IOMMU_UNMAP_DMA action during vm exit because of this judgement. >> >> This judgement is added in commit 8f0d5bb95f76 ("vfio iommu type1: Add >> task structure to vfio_dma") for the security reason. But it seems that >> no other task who has no family relationship with old and new process >> can get the same vfio_dma struct here for the reason of resource >> isolation. So this patch delete it. >> >> Signed-off-by: Li Zhe <lizhe.67@xxxxxxxxxxxxx> >> Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxx> >> --- >> drivers/vfio/vfio_iommu_type1.c | 6 ------ >> 1 file changed, 6 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >> index c13b9290e357..a8ff00dad834 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -1377,12 +1377,6 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, >> >> if (!iommu->v2 && iova > dma->iova) >> break; >> - /* >> - * Task with same address space who mapped this iova range is >> - * allowed to unmap the iova range. >> - */ >> - if (dma->task->mm != current->mm) >> - break; >> >> if (invalidate_vaddr) { >> if (dma->vaddr_invalid) { >