On 2021-05-20 15:34, Jason Gunthorpe wrote:
On Thu, May 20, 2021 at 03:13:55PM +0100, Robin Murphy wrote:
By "mdev-like" I mean it's very similar in shape to the general SIOV-style
mediated device concept - i.e. a physical device with an awareness of
operating on multiple contexts at once, using a Substream ID/PASID for each
one - but instead of exposing control of the contexts to anyone else, they
remain hidden behind the kernel driver which already has its own abstracted
uAPI, so overall it ends up as more just internal housekeeping than any
actual mediation. We were looking at the mdev code for inspiration, but
directly using it was never the plan.
Well:
- Who maps memory into the IOASID (ie the specific sub stream id)?
Sorry to nitpick, but I think it's important to get terminology right
here to avoid unnecessary misunderstanding. You can't map memory into an
address space ID; it's just a number. Ultimately that identifier ends up
pointing at some actual address space, and most of the current work is
focused on the case of that address space being provided by an mm where
things are mapped implicitly by a userspace process; I care about the
case of it being provided by an iommu_domain where things are mapped
explicitly by a kernel driver. I would be extremely wary of creating
some new third *address space* abstraction.
- What memory must be mapped?
- Who triggers DMA to this memory?
It's a pretty typical DMA flow, as far as I understand. Userspace
allocates some buffers (in this case, via the kernel driver, but in
general I'm not sure it makes much difference), puts data in the
buffers, issues an ioctl to say "process this data", and polls for
completion; the kernel driver makes sure the buffers are mapped in the
device address space (at allocation time in this case, but in general I
assume it could equally be done at request time for user pages), and
deals with scheduling requests onto the hardware. I understand this
interface is already deployed in a driver stack which supports a single
client process at once; extending the internals to allow requests from
multiple processes to run in parallel using Substream IDs for isolation
is the future goal. The interface itself shouldn't change, only some
internal arbitration details.
The driver simply needs to keep track of the domains and PASIDs -
when a process submits some work, it can look up the relevant
domain, iommu_map() the user pages to the right addresses, dma_map()
them for coherency, then poke in the PASID as part of scheduling the
work on the physical device.
If you are doing stuff like this then the /dev/ioasid is what you
actually want. The userprocess can create its own IOASID, program the
io page tables for that IOASID to point to pages as it wants and then
just hand over a fully instantiated io page table to the device
driver.
No. In our case, the device does not need to operate on userspace
addresses, in fact quite the opposite. There may need to be additional
things mapped into the device address space which are not, and should
not be, visible to userspace. There are also some quite weird criteria
for optimal address space layout which frankly are best left hidden
inside the kernel driver. Said driver is already explicitly managing its
own iommu_domain in the same manner as various DRM drivers and others,
so growing that to multiple parallel domains really isn't a big leap.
Moving any of this responsibility into userspace would be unwanted and
unnecessary upheaval.
What you are describing is the literal use case of /dev/ioasid - a
clean seperation of managing the IOMMU related parts through
/dev/ioasid and the device driver itself is only concerned with
generating device DMA that has the proper PASID/substream tag.
The entire point is to not duplicate all the iommu code you are
describing having written into every driver that just wants an IOASID.
In particular, you are talking about having a substream capable device
and driver but your driver's uAPI is so limited it can't address the
full range of substream configurations:
- A substream pointing at a SVA
- A substream pointing a IO page table nested under another
- A substream pointing at an IOMMU page table shared by many users
And more. Which is bad.
None of which make much if any sense for the way this device and the
rest of its software stack are designed to work, though. Anyway, the
actual uAPI in question is essentially just chucking buffer fds about in
a very abstract manner, so I don't see that it has any relevance here.
We're talking about a kernel driver *internally* managing how it chooses
to expose the buffers backing those fds to the hardware. SVA has no
meaning in that context (there's nothing to share), and I don't even
understand your second case, but attaching multiple SSIDs to a single
domain is absolutely something which _could_ be done, there's just zero
point in a single driver doing that privately when it could simply run
the relevant jobs under the same SSID instead.
We already talked about this on the "how to use PASID from the kernel"
thread.
Do you have a pointer to the right thread so I can catch up? It's not the
easiest thing to search for on lore amongst all the other PASID-related
business :(
Somewhere in here:
http://lore.kernel.org/r/20210517143758.GP1002214@xxxxxxxxxx
Thanks, along with our discussion here that kind of confirms my concern.
Assuming IOASID can wrap up a whole encapsulated thing which is either
SVA or IOMMU_DOMAIN_DMA is too much of an overabstraction. There
definitely *are* uses for IOMMU_DOMAIN_DMA - say you want to put some
SIOV ADIs to work for the host kernel using their regular
non-IOMMU-aware driver - but there will also be cases for
IOMMU_DOMAIN_UNMANAGED, although I do mostly expect those to be SoC
devices whose drivers are already IOMMU-aware and just want to be so at
a finer-grained level, not PCI devices. Even IOMMU_DOMAIN_PASSTHROUGH
for IOASIDs _could_ be doable if a sufficiently compelling reason came
along. I agree that SVA on init_mm is pretty bonkers, but don't get too
hung up on the DMA API angle which is really orthogonal - passthrough
domains with dma-direct ops have been working fine for years.
FWIW my non-SVA view is that a PASID is merely an index into a set of
iommu_domains, and in that context it doesn't even really matter *who*
allocates them, only that the device driver and IOMMU driver are in sync :)
Right, this is where /dev/ioasid is going.
However it gets worked out at the kAPI level in the iommu layer the
things you asked for are intended to be solved, and lots more.
Great! It feels like one of the major things will be that, at least
without major surgery to the DMA API, most of the use-cases will likely
still need a struct device wrapped around the IOASID. I think the
particular one I want to solve is actually the odd one out in that it
doesn't really care, and could be made to work either way.
Thanks,
Robin.