On 12/15/2022 9:20 AM, Jason Gunthorpe wrote: > On Wed, Dec 14, 2022 at 01:24:54PM -0800, Steve Sistare wrote: >> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR >> interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a >> dma mapping, locked_vm underflows to a large unsigned value, and a >> subsequent dma map request fails with ENOMEM in __account_locked_vm. >> >> To avoid underflow, do not decrement locked_vm during unmap if the >> dma's mm has changed. To restore the correct locked_vm count, when >> VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, add >> the mapping's pinned page count to the new mm->locked_vm, subject >> to the rlimit. Now that mediated devices are excluded when using >> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of >> the mapping. >> >> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") >> >> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> >> --- >> drivers/vfio/vfio_iommu_type1.c | 50 ++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 49 insertions(+), 1 deletion(-) > >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >> index bdfc13c..33dc8ec 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -100,6 +100,7 @@ struct vfio_dma { >> struct task_struct *task; >> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >> unsigned long *bitmap; >> + struct mm_struct *mm; >> }; > > I'm not sure this is quite, right, or at least the comment isn't quite > right.. > > The issue is that the vfio_dma does not store the mm that provided the > pages. By going through the task every time it allows the mm to change > under its feet which of course can corrupt the assumed accounting in > various ways. > > To fix this, the mm should be kept, as you did above. All the code > that is touching the task to get the mm should be dropped. The only > purpose of the task is to check the rlimit. Yes. While developing my "redo" series I tried it that way, but did not post that version. Functionally it should be equivalent to this series, but I can code it again to see if it looks cleaner. - Steve > That in of itself should be a single patch with a clear description > that is the change. > > Beyond that you want a second patch that make the vaddr stuff to > transfer the pin accounting from one mm to another in the process of > updating the dma to have a new task and mm. > > There is no "underflow" here, only that the current code is using the > wrong mm in some case because it gets it through the task. > > Jason