On 29/11/2023 8:36 pm, Jason Gunthorpe wrote:
On Wed, Nov 29, 2023 at 05:42:57PM +0000, Robin Murphy wrote:
Hi all,
Prompted by Jason's proposal[1], here's a first step towards truly
unpicking the dma_configure vs. IOMMU mess. As I commented before, we
have an awful lot of accumulated cruft and technical debt here making
things more complicated than they need to be, and we already have hacks
on top of hacks trying to work around it, so polishing those hacks even
further is really not a desirable direction of travel. And I do know
they're hacks, because I wrote most of them and still remember enough of
the context of the time ;)
I quite like this, I was also looking at getting rid of those other
parameters.
I wanted to take smaller steps because it is all pretty hairy.
One thing that still concerns me is if the FW data restricts the valid
IOVA window that really should be reflected into the reserved ranges
and not just dumped into the struct device for use by the DMA API.
Or, perhaps, viof/iommufd should be using the struct device data to
generate some additional reserved ranges?
Either way, I would like to see the dma_iommu and the rest of the
subsystem agree on what the valid IOVA ranges actually are.
Note that there is some intentional divergence where iommu-dma reserves
IOVAs matching PCI outbound windows because it knows it wants to avoid
clashing with potential peer-to-peer addresses and doesn't want to have
to get into the details of ACS redirect etc., but we don't expose those
as generic reserved regions because they're firmly a property of the PCI
host bridge, not of the IOMMU group (and more practically, because we
did do so briefly and it made QEMU unhappy). I think there may also have
been some degree of conclusion that it's not the IOMMU API's place to
get in the way of other domain users trying to do weird P2P stuff if
they really want to.
Another issue is that the generic dma_range_map strictly represents
device-specific constraints which may not always be desirable or
appropriate to apply to a whole group. There wasn't really a conscious
decision as such, but it kind of works out as why we still only consider
PCI's bridge->dma_ranges (which comes from the same underlying data),
since we can at least assume every device behind a bridge accesses
memory through that bridge and so inherits its restrictions. However I
don't recall any conscious decision for inbound windows to only be
considered for DMA domain reservations rather than for proper reserved
regions - pretty sure that's just a case of that code being added in the
place where it seemed to fit best at the time (because hey it's more
host bridge windows and we already have a thing for host bridge windows...)
Thanks,
Robin.