Re: [PATCH V3 2/5] vfio/type1: prevent locked_vm underflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/14/2022 3:03 PM, Alex Williamson wrote:
> On Wed, 14 Dec 2022 11:22:48 -0800
> Steve Sistare <steven.sistare@xxxxxxxxxx> wrote:
> 
>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
>> interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
>> dma mapping, locked_vm underflows to a large unsigned value, and a
>> subsequent dma map request fails with ENOMEM in __account_locked_vm.
>>
>> To avoid underflow, do not decrement locked_vm during unmap if the
>> dma's mm has changed.  To restore the correct locked_vm count, when
>> VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, add
>> the mapping's pinned page count to the new mm->locked_vm, subject
>> to the rlimit.  Now that mediated devices are excluded when using
>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of
>> the mapping.
>>
>> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
>>
>> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 49 +++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 49 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index b04f485..e719c13 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -100,6 +100,7 @@ struct vfio_dma {
>>  	struct task_struct	*task;
>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>>  	unsigned long		*bitmap;
>> +	struct mm_struct	*mm;
>>  };
>>  
>>  struct vfio_batch {
>> @@ -424,6 +425,12 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>>  	if (!mm)
>>  		return -ESRCH; /* process exited */
>>  
>> +	/* Avoid locked_vm underflow */
>> +	if (dma->mm != mm && npage < 0) {
>> +		mmput(mm);
>> +		return 0;
>> +	}
> 
> How about initialize ret = 0 and jump to the existing mmput() with a
> goto, so there's no assumptions about whether we need the mmput() or
> not.

OK.

>> +
>>  	ret = mmap_write_lock_killable(mm);
>>  	if (!ret) {
>>  		ret = __account_locked_vm(mm, abs(npage), npage > 0, dma->task,
>> @@ -1180,6 +1187,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  	vfio_unmap_unpin(iommu, dma, true);
>>  	vfio_unlink_dma(iommu, dma);
>>  	put_task_struct(dma->task);
>> +	mmdrop(dma->mm);
>>  	vfio_dma_bitmap_free(dma);
>>  	if (dma->vaddr_invalid) {
>>  		iommu->vaddr_invalid_count--;
>> @@ -1578,6 +1586,42 @@ static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
>>  	return list_empty(iova);
>>  }
>>  
>> +static int vfio_change_dma_owner(struct vfio_dma *dma)
>> +{
>> +	int ret = 0;
>> +	struct mm_struct *mm = get_task_mm(dma->task);
>> +
>> +	if (dma->mm != mm) {
>> +		long npage = dma->size >> PAGE_SHIFT;
>> +		bool new_lock_cap = capable(CAP_IPC_LOCK);
>> +		struct task_struct *new_task = current->group_leader;
>> +
>> +		ret = mmap_write_lock_killable(new_task->mm);
>> +		if (ret)
>> +			goto out;
>> +
>> +		ret = __account_locked_vm(new_task->mm, npage, true,
>> +					  new_task, new_lock_cap);
>> +		mmap_write_unlock(new_task->mm);
>> +		if (ret)
>> +			goto out;
>> +
>> +		if (dma->task != new_task) {
>> +			vfio_lock_acct(dma, -npage, 0);
>> +			put_task_struct(dma->task);
>> +			dma->task = get_task_struct(new_task);
>> +		}
> 
> IIUC, we're essentially open coding vfio_lock_acct() in the previous
> section so that we can be sure we've accounted the new task before we
> credit the previous task.  

Correct.

> However, I was under the impression the task
> remained the same, but the mm changes, which is how we end up with the
> underflow described in the commit log.  What circumstances cause a task
> change?  Thanks,
The task changes if one does fork/exec live update.  I tested exec and fork/exec
with my changes.

- Steve

>> +		mmdrop(dma->mm);
>> +		dma->mm = new_task->mm;
>> +		mmgrab(dma->mm);
>> +		dma->lock_cap = new_lock_cap;
>> +	}
>> +out:
>> +	if (mm)
>> +		mmput(mm);
>> +	return ret;
>> +}
>> +
>>  static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  			   struct vfio_iommu_type1_dma_map *map)
>>  {
>> @@ -1627,6 +1671,9 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  			   dma->size != size) {
>>  			ret = -EINVAL;
>>  		} else {
>> +			ret = vfio_change_dma_owner(dma);
>> +			if (ret)
>> +				goto out_unlock;
>>  			dma->vaddr = vaddr;
>>  			dma->vaddr_invalid = false;
>>  			iommu->vaddr_invalid_count--;
>> @@ -1687,6 +1734,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  	get_task_struct(current->group_leader);
>>  	dma->task = current->group_leader;
>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
>> +	dma->mm = dma->task->mm;
>> +	mmgrab(dma->mm);
>>  
>>  	dma->pfn_list = RB_ROOT;
>>  
> 



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux