> From: Steve Sistare <steven.sistare@xxxxxxxxxx> > Sent: Saturday, December 17, 2022 2:51 AM > > When a vfio container is preserved across exec, the task does not change, > but it gets a new mm with locked_vm=0. If the user later unmaps a dma > mapping, locked_vm underflows to a large unsigned value, and a subsequent > dma map request fails with ENOMEM in __account_locked_vm. > > To avoid underflow, grab and save the mm at the time a dma is mapped. > Use that mm when adjusting locked_vm, rather than re-acquiring the saved > task's mm, which may have changed. If the saved mm is dead, do nothing. worth clarifying that locked_vm of the new mm is still not fixed. > @@ -1664,15 +1666,7 @@ static int vfio_dma_do_map(struct vfio_iommu > *iommu, > * against the locked memory limit and we need to be able to do both > * outside of this call path as pinning can be asynchronous via the > * external interfaces for mdev devices. RLIMIT_MEMLOCK requires > a > - * task_struct and VM locked pages requires an mm_struct, however > - * holding an indefinite mm reference is not recommended, > therefore we > - * only hold a reference to a task. We could hold a reference to > - * current, however QEMU uses this call path through vCPU threads, > - * which can be killed resulting in a NULL mm and failure in the > unmap > - * path when called via a different thread. Avoid this problem by > - * using the group_leader as threads within the same group require > - * both CLONE_THREAD and CLONE_VM and will therefore use the > same > - * mm_struct. > + * task_struct and VM locked pages requires an mm_struct. IMHO the rationale why choosing group_leader still applies... otherwise this looks good to me: Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx>