On Fri, Mar 24, 2023 at 10:54:39AM +0800, Jason Wang wrote:
On Thu, Mar 23, 2023 at 5:50 PM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
On Thu, Mar 23, 2023 at 11:42:07AM +0800, Jason Wang wrote:
>On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
>>
>> The new "use_va" module parameter (default: true) is used in
>> vdpa_alloc_device() to inform the vDPA framework that the device
>> supports VA.
>>
>> vringh is initialized to use VA only when "use_va" is true and the
>> user's mm has been bound. So, only when the bus supports user VA
>> (e.g. vhost-vdpa).
>>
>> vdpasim_mm_work_fn work is used to serialize the binding to a new
>> address space when the .bind_mm callback is invoked, and unbinding
>> when the .unbind_mm callback is invoked.
>>
>> Call mmget_not_zero()/kthread_use_mm() inside the worker function
>> to pin the address space only as long as needed, following the
>> documentation of mmget() in include/linux/sched/mm.h:
>>
>> * Never use this function to pin this address space for an
>> * unbounded/indefinite amount of time.
>
>I wonder if everything would be simplified if we just allow the parent
>to advertise whether or not it requires the address space.
>
>Then when vhost-vDPA probes the device it can simply advertise
>use_work as true so vhost core can use get_task_mm() in this case?
IIUC set user_worker to true, it also creates the kthread in the vhost
core (but we can add another variable to avoid this).
My biggest concern is the comment in include/linux/sched/mm.h.
get_task_mm() uses mmget(), but in the documentation they advise against
pinning the address space indefinitely, so I preferred in keeping
mmgrab() in the vhost core, then call mmget_not_zero() in the worker
only when it is running.
Ok.
In the future maybe mm will be used differently from parent if somehow
it is supported by iommu, so I would leave it to the parent to handle
this.
This should be possible, I was told by Intel that their IOMMU can
access the process page table for shared virtual memory.
Cool, we should investigate this. Do you have any pointers to their
documentation?
Thanks,
Stefano