Re: [PATCH V2 2/5] vfio/type1: prevent locked_vm underflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/13/2022 4:31 PM, Alex Williamson wrote:
> On Tue, 13 Dec 2022 16:01:21 -0500
> Steven Sistare <steven.sistare@xxxxxxxxxx> wrote:
>> On 12/13/2022 3:23 PM, Alex Williamson wrote:
>>> On Tue, 13 Dec 2022 11:40:56 -0800
>>> Steve Sistare <steven.sistare@xxxxxxxxxx> wrote:
>>>   
>>>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
>>>> interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
>>>> dma mapping, locked_vm underflows to a large unsigned value, and a
>>>> subsequent dma map request fails with ENOMEM in __account_locked_vm.
>>>>
>>>> To avoid underflow, do not decrement locked_vm during unmap if the
>>>> dma's mm has changed.  To restore the correct locked_vm count, when
>>>> VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, add
>>>> the mapping's pinned page count to the new mm->locked_vm, subject
>>>> to the rlimit.  Now that mediated devices are excluded when using
>>>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of
>>>> the mapping.  
>>>
>>> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
>>>
>>>   
>>>> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx>
>>>> ---
>>>>  drivers/vfio/vfio_iommu_type1.c | 23 +++++++++++++++++++----
>>>>  1 file changed, 19 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>> index 80bdb4d..35a1a52 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -100,6 +100,7 @@ struct vfio_dma {
>>>>  	struct task_struct	*task;
>>>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>>>>  	unsigned long		*bitmap;
>>>> +	struct mm_struct	*mm;
>>>>  };
>>>>  
>>>>  struct vfio_batch {
>>>> @@ -1165,7 +1166,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
>>>>  					    &iotlb_gather);
>>>>  	}
>>>>  
>>>> -	if (do_accounting) {
>>>> +	if (do_accounting && current->mm == dma->mm) {  
>>>
>>>
>>> This seems incompatible with ffed0518d871 ("vfio: remove useless
>>> judgement") where we no longer assume that the unmap mm is the same as
>>> the mapping mm.  
>>
>> They are compatible.  My fix allows another task to unmap, but only decreases
>> locked_vm if the current mm matches the original mm that locked it.  And the
>> "original" mm is updated by MAP_FLAG_VADDR.
> 
> It seems like there's either a bug fix or behavioral change to
> ffed0518d871 then.  What mm were we previously accounting in their
> fork/exec scenario that we're not with this change?

locked_vm is broken in their fork/exec scenario.  They must have some other
patch or work around for it.  Or, they have not tested enough to hit underflow.
If after exec, you first dma map something new, then dma unmap something smaller,
locked_vm does not underflow.

>>> Does this need to get_task_mm(dma->task) and compare that mm to dma->mm
>>> to determine whether an exec w/o vaddr remapping has occurred?  That's
>>> the only use case I can figure out where grabbing the mm for dma->mm
>>> actually makes any sense at all.  
>>
>> The mm grab does detect an exec.  Before exec, at map time, we get task and grab
>> its mm.  During exec, task gets a new mm.  The old mm becomes defunct, but we
>> still hold it and can examine its pointer address.
> 
> This is describing exactly the test I'm asking about, if dma->task->mm
> no longer matches dma->mm then an exec has occurred w/o a subsequent
> vaddr remap.  So why are we bringing current->mm into the equation?

Sorry, memories of my "redo" series are leaking in.  You are correct,
vfio_lock_acct modifies dma->task->mm, and dma->task may differ from current,
so I do need to check get_task_mm(dma->task).

- Steve

>> The new code does not require that current == dma->task.
>>
>>>>  		vfio_lock_acct(dma, -unlocked, true);
>>>>  		return 0;
>>>>  	}
>>>> @@ -1178,6 +1179,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>>>  	vfio_unmap_unpin(iommu, dma, true);
>>>>  	vfio_unlink_dma(iommu, dma);
>>>>  	put_task_struct(dma->task);
>>>> +	mmdrop(dma->mm);
>>>>  	vfio_dma_bitmap_free(dma);
>>>>  	if (dma->vaddr_invalid) {
>>>>  		iommu->vaddr_invalid_count--;
>>>> @@ -1623,9 +1625,20 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>>  			   dma->size != size) {
>>>>  			ret = -EINVAL;
>>>>  		} else {
>>>> -			dma->vaddr = vaddr;
>>>> -			dma->vaddr_invalid = false;
>>>> -			iommu->vaddr_invalid_count--;
>>>> +			if (current->mm != dma->mm) {
>>>> +				ret = vfio_lock_acct(dma, size >> PAGE_SHIFT,
>>>> +						     0);
>>>> +				if (!ret) {
>>>> +					mmdrop(dma->mm);
>>>> +					dma->mm = current->mm;
>>>> +					mmgrab(dma->mm);
>>>> +				}
>>>> +			}
>>>> +			if (!ret) {
>>>> +				dma->vaddr = vaddr;
>>>> +				dma->vaddr_invalid = false;
>>>> +				iommu->vaddr_invalid_count--;
>>>> +			}  
>>>
>>> Poor flow, shouldn't this be:
>>>
>>> 			if (current->mm != dma->mm) {
>>> 				ret = vfio_lock_acct(dma,
>>> 						     size >> PAGE_SHIFT, 0);
>>> 				if (ret)
>>> 					goto out_unlock;
>>>
>>> 				mmdrop(dma->mm);
>>> 				dma->mm = current->mm;
>>> 				mmgrab(dma->mm);
>>> 			}
>>> 			dma->vaddr = vaddr;
>>> 			dma->vaddr_invalid = false;
>>> 			iommu->vaddr_invalid_count--;  
>>
>> Better, will do, thanks.
>>
>> - Steve
>>
>>>>  			wake_up_all(&iommu->vaddr_wait);
>>>>  		}
>>>>  		goto out_unlock;
>>>> @@ -1683,6 +1696,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>>  	get_task_struct(current->group_leader);
>>>>  	dma->task = current->group_leader;
>>>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
>>>> +	dma->mm = dma->task->mm;
>>>> +	mmgrab(dma->mm);
>>>>  
>>>>  	dma->pfn_list = RB_ROOT;
>>>>    
>>>   
>>
> 



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux