Re: [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow

Steven Sistare <steven.sistare@xxxxxxxxxx> · Tue, 13 Dec 2022 13:17:38 -0500

On 12/13/2022 1:02 PM, Alex Williamson wrote:
> On Tue, 13 Dec 2022 07:46:56 -0800
> Steve Sistare <steven.sistare@xxxxxxxxxx> wrote:
> 
>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
>> interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
>> dma mapping, locked_vm underflows to a large unsigned value, and a
>> subsequent dma map request fails with ENOMEM in __account_locked_vm.
>>
>> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed,
>> add the mapping's pinned page count to the new mm->locked_vm, subject to
>> the rlimit.  Now that mediated devices are excluded when using
>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the
>> mapping.
>>
>> Underflow will not occur when all dma mappings are invalidated before exec.
>> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR
>> will fail with EINVAL because the mapping is in the vaddr_invalid state.
> 
> Where is this enforced?

In vfio_dma_do_unmap:
        if (invalidate_vaddr) {
                if (dma->vaddr_invalid) {
                        ...
                        ret = -EINVAL;

>> Underflow may still occur in a buggy application that fails to invalidate
>> all before exec.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index f81e925..e5a02f8 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -100,6 +100,7 @@ struct vfio_dma {
>>  	struct task_struct	*task;
>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>>  	unsigned long		*bitmap;
>> +	struct mm_struct	*mm;
>>  };
>>  
>>  struct vfio_batch {
>> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  	vfio_unmap_unpin(iommu, dma, true);
>>  	vfio_unlink_dma(iommu, dma);
>>  	put_task_struct(dma->task);
>> +	mmdrop(dma->mm);
>>  	vfio_dma_bitmap_free(dma);
>>  	if (dma->vaddr_invalid) {
>>  		iommu->vaddr_invalid_count--;
>> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  			dma->vaddr = vaddr;
>>  			dma->vaddr_invalid = false;
>>  			iommu->vaddr_invalid_count--;
>> +			if (current->mm != dma->mm) {
>> +				mmdrop(dma->mm);
>> +				dma->mm = current->mm;
>> +				mmgrab(dma->mm);
>> +				ret = vfio_lock_acct(dma, size >> PAGE_SHIFT,
>> +						     0);
> 
> What does it actually mean if this fails?  The pages are still pinned.
> lock_vm doesn't get updated.  Underflow can still occur.  Thanks,

If this fails, the user has locked additional memory after exec and before making
this call -- more than was locked before exec -- and the rlimit is exceeded.
A misbehaving application, which will only hurt itself.

However, I should reorder these, and check ret before changing the other state.

- Steve

>> +			}
>>  			wake_up_all(&iommu->vaddr_wait);
>>  		}
>>  		goto out_unlock;
>> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  	get_task_struct(current->group_leader);
>>  	dma->task = current->group_leader;
>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
>> +	dma->mm = dma->task->mm;
>> +	mmgrab(dma->mm);
>>  
>>  	dma->pfn_list = RB_ROOT;
>>  
>