On Tue, Jun 28, 2022 at 08:48:11AM -0400, Steven Sistare wrote: > For cpr, old qemu directly exec's new qemu, so task does not change. > > To support fork+exec, the ownership test needs to be deleted or modified. > > Pinned page accounting is another issue, as the parent counts pins in its > mm->locked_vm. If the child unmaps, it cannot simply decrement its own > mm->locked_vm counter. It is fine already: mm = async ? get_task_mm(dma->task) : dma->task->mm; if (!mm) return -ESRCH; /* process exited */ ret = mmap_write_lock_killable(mm); if (!ret) { ret = __account_locked_vm(mm, abs(npage), npage > 0, dma->task, dma->lock_cap); Each 'dma' already stores a pointer to the mm that sourced it and only manipulates the counter in that mm. AFAICT 'current' is not used during unmap. > As you and I have discussed, the count is also wrong in the direct > exec model, because exec clears mm->locked_vm. Really? Yikes, I thought exec would generate a new mm? > I am thinking vfio could count pins in struct user locked_vm to handle both > models. The user struct and its count would persist across direct exec, > and be shared by parent and child for fork+exec. However, that does change > the RLIMIT_MEMLOCK value that applications must set, because the limit must > accommodate vfio plus other sub-systems that count in user->locked_vm, which > includes io_uring, skbuff, xdp, and perf. Plus, the limit must accommodate all > processes of that user, not just a single process. We discussed this, for iommufd we are currently planning to go this way and will See How it Goes. Jason