On 2020/8/13 下午1:26, Tian, Kevin wrote:
From: Jason Wang <jasowang@xxxxxxxxxx>
Sent: Thursday, August 13, 2020 12:34 PM
On 2020/8/12 下午12:05, Tian, Kevin wrote:
The problem is that if we tie all controls via VFIO uAPI, the other
subsystem like vDPA is likely to duplicate them. I wonder if there is a
way to decouple the vSVA out of VFIO uAPI?
vSVA is a per-device (either pdev or mdev) feature thus naturally should
be managed by its device driver (VFIO or vDPA). From this angle some
duplication is inevitable given VFIO and vDPA are orthogonal passthrough
frameworks. Within the kernel the majority of vSVA handling is done by
IOMMU and IOASID modules thus most logic are shared.
So why not introduce vSVA uAPI at IOMMU or IOASID layer?
One may ask a similar question why IOMMU doesn't expose map/unmap
as uAPI...
I think this is probably a good idea as well. If there's anything missed
in the infrastructure, we can invent. Besides vhost-vDPA, there are
other subsystems that relaying their uAPI to IOMMU API. Duplicating
uAPIs is usually a hint of the codes duplication. Simple map/unmap could
be easy but vSVA uAPI is much more complicated.
If an userspace DMA interface can be easily
adapted to be a passthrough one, it might be the choice.
It's not that easy even for VFIO which requires a lot of new uAPIs and
infrastructures(e.g mdev) to be invented.
But for idxd,
we see mdev a much better fit here, given the big difference between
what userspace DMA requires and what guest driver requires in this hw.
A weak point for mdev is that it can't serve kernel subsystem other than
VFIO. In this case, you need some other infrastructures (like [1]) to do
this.
mdev is not exclusive from kernel usages. It's perfectly fine for a driver
to reserve some work queues for host usages, while wrapping others
into mdevs.
I meant you may want slices to be an independent device from the kernel
point of view:
E.g for ethernet devices, you may want 10K mdevs to be passed to guest.
Similarly, you may want 10K net devices which is connected to the kernel
networking subsystems.
In this case it's not simply reserving queues but you need some other
type of device abstraction. There could be some kind of duplication
between this and mdev.
yes, some abstraction required but isn't it what the driver should
care about instead of mdev framework itself?
With mdev you present a "PCI" device, but what's kind of device it tries
to present to kernel? If it's still PCI, there's duplication with mdev,
if it's something new, maybe we can switch to that API.
If the driver reports
the same set of resource to both mdev and networking, it needs to
make sure when the resource is claimed in one interface then it
should be marked in-use in another. e.g. each mdev includes a
available_intances attribute. the driver could report 10k available
instances initially and then update it to 5K when another 5K is used
for net devices later.
Right but this probably means you need another management layer under mdev.
Mdev definitely has its usage limitations. Some may be improved
in the future, some may not. But those are distracting from the
original purpose of this thread (mdev vs. userspace DMA) and better
be discussed in other places e.g. LPC...
Ok.
Thanks
Thanks
Kevin