Re: [PATCH V2 2/5] vfio/type1: prevent locked_vm underflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 13 Dec 2022 16:01:21 -0500
Steven Sistare <steven.sistare@xxxxxxxxxx> wrote:

> On 12/13/2022 3:23 PM, Alex Williamson wrote:
> > On Tue, 13 Dec 2022 11:40:56 -0800
> > Steve Sistare <steven.sistare@xxxxxxxxxx> wrote:
> >   
> >> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
> >> interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
> >> dma mapping, locked_vm underflows to a large unsigned value, and a
> >> subsequent dma map request fails with ENOMEM in __account_locked_vm.
> >>
> >> To avoid underflow, do not decrement locked_vm during unmap if the
> >> dma's mm has changed.  To restore the correct locked_vm count, when
> >> VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, add
> >> the mapping's pinned page count to the new mm->locked_vm, subject
> >> to the rlimit.  Now that mediated devices are excluded when using
> >> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of
> >> the mapping.  
> > 
> > Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
> > 
> >   
> >> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx>
> >> ---
> >>  drivers/vfio/vfio_iommu_type1.c | 23 +++++++++++++++++++----
> >>  1 file changed, 19 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >> index 80bdb4d..35a1a52 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -100,6 +100,7 @@ struct vfio_dma {
> >>  	struct task_struct	*task;
> >>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
> >>  	unsigned long		*bitmap;
> >> +	struct mm_struct	*mm;
> >>  };
> >>  
> >>  struct vfio_batch {
> >> @@ -1165,7 +1166,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
> >>  					    &iotlb_gather);
> >>  	}
> >>  
> >> -	if (do_accounting) {
> >> +	if (do_accounting && current->mm == dma->mm) {  
> > 
> > 
> > This seems incompatible with ffed0518d871 ("vfio: remove useless
> > judgement") where we no longer assume that the unmap mm is the same as
> > the mapping mm.  
> 
> They are compatible.  My fix allows another task to unmap, but only decreases
> locked_vm if the current mm matches the original mm that locked it.  And the
> "original" mm is updated by MAP_FLAG_VADDR.

It seems like there's either a bug fix or behavioral change to
ffed0518d871 then.  What mm were we previously accounting in their
fork/exec scenario that we're not with this change?

> > Does this need to get_task_mm(dma->task) and compare that mm to dma->mm
> > to determine whether an exec w/o vaddr remapping has occurred?  That's
> > the only use case I can figure out where grabbing the mm for dma->mm
> > actually makes any sense at all.  
> 
> The mm grab does detect an exec.  Before exec, at map time, we get task and grab
> its mm.  During exec, task gets a new mm.  The old mm becomes defunct, but we
> still hold it and can examine its pointer address.

This is describing exactly the test I'm asking about, if dma->task->mm
no longer matches dma->mm then an exec has occurred w/o a subsequent
vaddr remap.  So why are we bringing current->mm into the equation?
Thanks,

Alex 

> The new code does not require that current == dma->task.
> 
> >>  		vfio_lock_acct(dma, -unlocked, true);
> >>  		return 0;
> >>  	}
> >> @@ -1178,6 +1179,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> >>  	vfio_unmap_unpin(iommu, dma, true);
> >>  	vfio_unlink_dma(iommu, dma);
> >>  	put_task_struct(dma->task);
> >> +	mmdrop(dma->mm);
> >>  	vfio_dma_bitmap_free(dma);
> >>  	if (dma->vaddr_invalid) {
> >>  		iommu->vaddr_invalid_count--;
> >> @@ -1623,9 +1625,20 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >>  			   dma->size != size) {
> >>  			ret = -EINVAL;
> >>  		} else {
> >> -			dma->vaddr = vaddr;
> >> -			dma->vaddr_invalid = false;
> >> -			iommu->vaddr_invalid_count--;
> >> +			if (current->mm != dma->mm) {
> >> +				ret = vfio_lock_acct(dma, size >> PAGE_SHIFT,
> >> +						     0);
> >> +				if (!ret) {
> >> +					mmdrop(dma->mm);
> >> +					dma->mm = current->mm;
> >> +					mmgrab(dma->mm);
> >> +				}
> >> +			}
> >> +			if (!ret) {
> >> +				dma->vaddr = vaddr;
> >> +				dma->vaddr_invalid = false;
> >> +				iommu->vaddr_invalid_count--;
> >> +			}  
> > 
> > Poor flow, shouldn't this be:
> > 
> > 			if (current->mm != dma->mm) {
> > 				ret = vfio_lock_acct(dma,
> > 						     size >> PAGE_SHIFT, 0);
> > 				if (ret)
> > 					goto out_unlock;
> > 
> > 				mmdrop(dma->mm);
> > 				dma->mm = current->mm;
> > 				mmgrab(dma->mm);
> > 			}
> > 			dma->vaddr = vaddr;
> > 			dma->vaddr_invalid = false;
> > 			iommu->vaddr_invalid_count--;  
> 
> Better, will do, thanks.
> 
> - Steve
> 
> >>  			wake_up_all(&iommu->vaddr_wait);
> >>  		}
> >>  		goto out_unlock;
> >> @@ -1683,6 +1696,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >>  	get_task_struct(current->group_leader);
> >>  	dma->task = current->group_leader;
> >>  	dma->lock_cap = capable(CAP_IPC_LOCK);
> >> +	dma->mm = dma->task->mm;
> >> +	mmgrab(dma->mm);
> >>  
> >>  	dma->pfn_list = RB_ROOT;
> >>    
> >   
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux