On 2020/9/18 上午2:17, Jacob Pan (Jun) wrote:
Hi Jason,
On Thu, 17 Sep 2020 11:53:49 +0800, Jason Wang <jasowang@xxxxxxxxxx>
wrote:
On 2020/9/17 上午7:09, Jacob Pan (Jun) wrote:
Hi Jason,
On Wed, 16 Sep 2020 15:38:41 -0300, Jason Gunthorpe <jgg@xxxxxxxxxx>
wrote:
On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote:
Hi Jason,
On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe
<jgg@xxxxxxxxxx> wrote:
On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:
On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe
wrote:
On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun)
wrote:
If user space wants to bind page tables, create the PASID
with /dev/sva, use ioctls there to setup the page table
the way it wants, then pass the now configured PASID to a
driver that can use it.
Are we talking about bare metal SVA?
What a weird term.
Glad you noticed it at v7 :-)
Any suggestions on something less weird than
Shared Virtual Addressing? There is a reason why we moved from
SVM to SVA.
SVA is fine, what is "bare metal" supposed to mean?
What I meant here is sharing virtual address between DMA and host
process. This requires devices perform DMA request with PASID and
use IOMMU first level/stage 1 page tables.
This can be further divided into 1) user SVA 2) supervisor SVA
(sharing init_mm)
My point is that /dev/sva is not useful here since the driver can
perform PASID allocation while doing SVA bind.
No, you are thinking too small.
Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the
SVA.
Could you point to me the SVA UAPI? I couldn't find it in the
mainline. Seems VDPA uses VHOST interface?
It's the vhost_iotlb_msg defined in uapi/linux/vhost_types.h.
Thanks for the pointer, for complete vSVA functionality we would need
1 TLB flush (IOTLB and PASID cache etc.)
2 PASID alloc/free
3 bind/unbind page tables or PASID tables
4 Page request service
Seems vhost_iotlb_msg can be used for #1 partially. And the
proposal is to pluck out the rest into /dev/sda? Seems awkward as Alex
pointed out earlier for similar situation in VFIO.
Consider it doesn't have any PASID support yet, my understanding is that
if we go with /dev/sva:
- vhost uAPI will still keep the uAPI for associating an ASID to a
specific virtqueue
- except for this, we can use /dev/sva for all the rest (P)ASID operations
When VDPA is used by DPDK it makes sense that the PASID will be SVA
and 1:1 with the mm_struct.
I still don't see why bare metal DPDK needs to get a handle of the
PASID.
My understanding is that it may:
- have a unified uAPI with vSVA: alloc, bind, unbind, free
Got your point, but vSVA needs more than these
Yes it's just a subset of what vSVA required.
- leave the binding policy to userspace instead of the using a
implied one in the kenrel
Only if necessary.
Yes, I think it's all about visibility(flexibility) and**manageability.
Consider device has queue A, B, C. We will only dedicated queue A, B for
one PASID(for vSVA) and C with another PASID(for SVA). It looks to me
the current sva_bind() API doesn't support this. We still need an API
for allocating a PASID for SVA and assign it to the (mediated) device.
This case is pretty common for implementing a shadow queue for a guest.
Perhaps the SVA patch would explain. Or are you talking about
vDPA DPDK process that is used to support virtio-net-pmd in the
guest?
When VDPA is used by qemu it makes sense that the PASID will be an
arbitary IOVA map constructed to be 1:1 with the guest vCPU
physical map. /dev/sva allows a single uAPI to do this kind of
setup, and qemu can support it while supporting a range of SVA
kernel drivers. VDPA and vfio-mdev are obvious initial targets.
*BOTH* are needed.
In general any uAPI for PASID should have the option to use either
the mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs
virtually nothing to implement this in the driver as PASID is just
a number, and gives so much more flexability.
Not really nothing in terms of PASID life cycles. For example, if
user uses uacce interface to open an accelerator, it gets an
FD_acc. Then it opens /dev/sva to allocate PASID then get another
FD_pasid. Then we pass FD_pasid to the driver to bind page tables,
perhaps multiple drivers. Now we have to worry about If FD_pasid
gets closed before FD_acc(s) closed and all these race conditions.
I'm not sure I understand this. But this demonstrates the flexibility
of an unified uAPI. E.g it allows vDPA and VFIO device to use the
same PAISD which can be shared with a process in the guest.
This is for user DMA not for vSVA. I was contending that /dev/sva
creates unnecessary steps for such usage.
A question here is where the PASID management is expected to be done.
I'm not quite sure the silent 1:1 binding done in intel_svm_bind_mm()
can satisfy the requirement for management layer.
For vSVA, I think vDPA and VFIO can potentially share but I am not
seeing convincing benefits.
If a guest process wants to do SVA with a VFIO assigned device and a
vDPA-backed virtio-net at the same time, it might be a limitation if
PASID is not managed via a common interface.
Yes.
But I am not sure how vDPA
SVA support will look like, does it support gIOVA? need virtio IOMMU?
Yes, it supports gIOVA and it should work with any type of vIOMMU. I
think vDPA will start from Intel vIOMMU support in Qemu.
For virtio IOMMU, we will probably support it in the future consider it
doesn't have any SVA capability, and it doesn't use a page table that
can be nested via a hardware IOMMU.
For the race condition, it could be probably solved with refcnt.
Agreed but the best solution might be not to have the problem in the
first place :)
I agree, it's only worth to bother if it has real benefits.
Thanks
Thanks
If we do not expose FD_pasid to the user, the teardown is much
simpler and streamlined. Following each FD_acc close, PASID unbind
is performed.
Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev
will be introduced later.
Last patch is:
vfio/type1: Add vSVA support for IOMMU-backed mdevs
So pretty hard to see how this is not about vfio-mdev, at least a
little..
Jason
Thanks,
Jacob
Thanks,
Jacob