Re: [PATCH V4 2/5] vfio/type1: prevent locked_vm underflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/15/2022 9:20 AM, Jason Gunthorpe wrote:
> On Wed, Dec 14, 2022 at 01:24:54PM -0800, Steve Sistare wrote:
>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
>> interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
>> dma mapping, locked_vm underflows to a large unsigned value, and a
>> subsequent dma map request fails with ENOMEM in __account_locked_vm.
>>
>> To avoid underflow, do not decrement locked_vm during unmap if the
>> dma's mm has changed.  To restore the correct locked_vm count, when
>> VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, add
>> the mapping's pinned page count to the new mm->locked_vm, subject
>> to the rlimit.  Now that mediated devices are excluded when using
>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of
>> the mapping.
>>
>> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
>>
>> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 50 ++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 49 insertions(+), 1 deletion(-)
> 
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index bdfc13c..33dc8ec 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -100,6 +100,7 @@ struct vfio_dma {
>>  	struct task_struct	*task;
>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>>  	unsigned long		*bitmap;
>> +	struct mm_struct	*mm;
>>  };
> 
> I'm not sure this is quite, right, or at least the comment isn't quite
> right..
> 
> The issue is that the vfio_dma does not store the mm that provided the
> pages. By going through the task every time it allows the mm to change
> under its feet which of course can corrupt the assumed accounting in
> various ways.
> 
> To fix this, the mm should be kept, as you did above. All the code
> that is touching the task to get the mm should be dropped. The only
> purpose of the task is to check the rlimit.

Yes.  While developing my "redo" series I tried it that way, but did not post
that version.  Functionally it should be equivalent to this series, but I can
code it again to see if it looks cleaner.

- Steve

> That in of itself should be a single patch with a clear description
> that is the change.
> 
> Beyond that you want a second patch that make the vaddr stuff to
> transfer the pin accounting from one mm to another in the process of
> updating the dma to have a new task and mm.
> 
> There is no "underflow" here, only that the current code is using the
> wrong mm in some case because it gets it through the task.
> 
> Jason



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux