Re: [PATCH 3/6] vfio: remove the unused mdev iommu hook

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2021-05-20 15:34, Jason Gunthorpe wrote:
On Thu, May 20, 2021 at 03:13:55PM +0100, Robin Murphy wrote:

By "mdev-like" I mean it's very similar in shape to the general SIOV-style
mediated device concept - i.e. a physical device with an awareness of
operating on multiple contexts at once, using a Substream ID/PASID for each
one - but instead of exposing control of the contexts to anyone else, they
remain hidden behind the kernel driver which already has its own abstracted
uAPI, so overall it ends up as more just internal housekeeping than any
actual mediation. We were looking at the mdev code for inspiration, but
directly using it was never the plan.

Well:
  - Who maps memory into the IOASID (ie the specific sub stream id)?

Sorry to nitpick, but I think it's important to get terminology right here to avoid unnecessary misunderstanding. You can't map memory into an address space ID; it's just a number. Ultimately that identifier ends up pointing at some actual address space, and most of the current work is focused on the case of that address space being provided by an mm where things are mapped implicitly by a userspace process; I care about the case of it being provided by an iommu_domain where things are mapped explicitly by a kernel driver. I would be extremely wary of creating some new third *address space* abstraction.

  - What memory must be mapped?
  - Who triggers DMA to this memory?

It's a pretty typical DMA flow, as far as I understand. Userspace allocates some buffers (in this case, via the kernel driver, but in general I'm not sure it makes much difference), puts data in the buffers, issues an ioctl to say "process this data", and polls for completion; the kernel driver makes sure the buffers are mapped in the device address space (at allocation time in this case, but in general I assume it could equally be done at request time for user pages), and deals with scheduling requests onto the hardware. I understand this interface is already deployed in a driver stack which supports a single client process at once; extending the internals to allow requests from multiple processes to run in parallel using Substream IDs for isolation is the future goal. The interface itself shouldn't change, only some internal arbitration details.

The driver simply needs to keep track of the domains and PASIDs -
when a process submits some work, it can look up the relevant
domain, iommu_map() the user pages to the right addresses, dma_map()
them for coherency, then poke in the PASID as part of scheduling the
work on the physical device.

If you are doing stuff like this then the /dev/ioasid is what you
actually want. The userprocess can create its own IOASID, program the
io page tables for that IOASID to point to pages as it wants and then
just hand over a fully instantiated io page table to the device
driver.

No. In our case, the device does not need to operate on userspace addresses, in fact quite the opposite. There may need to be additional things mapped into the device address space which are not, and should not be, visible to userspace. There are also some quite weird criteria for optimal address space layout which frankly are best left hidden inside the kernel driver. Said driver is already explicitly managing its own iommu_domain in the same manner as various DRM drivers and others, so growing that to multiple parallel domains really isn't a big leap. Moving any of this responsibility into userspace would be unwanted and unnecessary upheaval.

What you are describing is the literal use case of /dev/ioasid - a
clean seperation of managing the IOMMU related parts through
/dev/ioasid and the device driver itself is only concerned with
generating device DMA that has the proper PASID/substream tag.

The entire point is to not duplicate all the iommu code you are
describing having written into every driver that just wants an IOASID.

In particular, you are talking about having a substream capable device
and driver but your driver's uAPI is so limited it can't address the
full range of substream configurations:

  - A substream pointing at a SVA
  - A substream pointing a IO page table nested under another
  - A substream pointing at an IOMMU page table shared by many users

And more. Which is bad.

None of which make much if any sense for the way this device and the rest of its software stack are designed to work, though. Anyway, the actual uAPI in question is essentially just chucking buffer fds about in a very abstract manner, so I don't see that it has any relevance here. We're talking about a kernel driver *internally* managing how it chooses to expose the buffers backing those fds to the hardware. SVA has no meaning in that context (there's nothing to share), and I don't even understand your second case, but attaching multiple SSIDs to a single domain is absolutely something which _could_ be done, there's just zero point in a single driver doing that privately when it could simply run the relevant jobs under the same SSID instead.

We already talked about this on the "how to use PASID from the kernel"
thread.

Do you have a pointer to the right thread so I can catch up? It's not the
easiest thing to search for on lore amongst all the other PASID-related
business :(

Somewhere in here:

http://lore.kernel.org/r/20210517143758.GP1002214@xxxxxxxxxx

Thanks, along with our discussion here that kind of confirms my concern. Assuming IOASID can wrap up a whole encapsulated thing which is either SVA or IOMMU_DOMAIN_DMA is too much of an overabstraction. There definitely *are* uses for IOMMU_DOMAIN_DMA - say you want to put some SIOV ADIs to work for the host kernel using their regular non-IOMMU-aware driver - but there will also be cases for IOMMU_DOMAIN_UNMANAGED, although I do mostly expect those to be SoC devices whose drivers are already IOMMU-aware and just want to be so at a finer-grained level, not PCI devices. Even IOMMU_DOMAIN_PASSTHROUGH for IOASIDs _could_ be doable if a sufficiently compelling reason came along. I agree that SVA on init_mm is pretty bonkers, but don't get too hung up on the DMA API angle which is really orthogonal - passthrough domains with dma-direct ops have been working fine for years.

FWIW my non-SVA view is that a PASID is merely an index into a set of
iommu_domains, and in that context it doesn't even really matter *who*
allocates them, only that the device driver and IOMMU driver are in sync :)

Right, this is where /dev/ioasid is going.

However it gets worked out at the kAPI level in the iommu layer the
things you asked for are intended to be solved, and lots more.

Great! It feels like one of the major things will be that, at least without major surgery to the DMA API, most of the use-cases will likely still need a struct device wrapped around the IOASID. I think the particular one I want to solve is actually the odd one out in that it doesn't really care, and could be made to work either way.

Thanks,
Robin.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux