On Wed, Dec 14, 2022 at 01:24:54PM -0800, Steve Sistare wrote: > When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR > interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a > dma mapping, locked_vm underflows to a large unsigned value, and a > subsequent dma map request fails with ENOMEM in __account_locked_vm. > > To avoid underflow, do not decrement locked_vm during unmap if the > dma's mm has changed. To restore the correct locked_vm count, when > VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, add > the mapping's pinned page count to the new mm->locked_vm, subject > to the rlimit. Now that mediated devices are excluded when using > VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of > the mapping. > > Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") > > Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> > --- > drivers/vfio/vfio_iommu_type1.c | 50 ++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 49 insertions(+), 1 deletion(-) > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index bdfc13c..33dc8ec 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -100,6 +100,7 @@ struct vfio_dma { > struct task_struct *task; > struct rb_root pfn_list; /* Ex-user pinned pfn list */ > unsigned long *bitmap; > + struct mm_struct *mm; > }; I'm not sure this is quite, right, or at least the comment isn't quite right.. The issue is that the vfio_dma does not store the mm that provided the pages. By going through the task every time it allows the mm to change under its feet which of course can corrupt the assumed accounting in various ways. To fix this, the mm should be kept, as you did above. All the code that is touching the task to get the mm should be dropped. The only purpose of the task is to check the rlimit. That in of itself should be a single patch with a clear description that is the change. Beyond that you want a second patch that make the vaddr stuff to transfer the pin accounting from one mm to another in the process of updating the dma to have a new task and mm. There is no "underflow" here, only that the current code is using the wrong mm in some case because it gets it through the task. Jason