Re: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE

"david@xxxxxxxxxxxxxxxxxxxxx" <david@xxxxxxxxxxxxxxxxxxxxx> · Thu, 14 Oct 2021 15:33:21 +1100

On Mon, Oct 11, 2021 at 02:17:48PM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 11, 2021 at 04:37:38PM +1100, david@xxxxxxxxxxxxxxxxxxxxx wrote:
> > > PASID support will already require that a device can be multi-bound to
> > > many IOAS's, couldn't PPC do the same with the windows?
> > 
> > I don't see how that would make sense.  The device has no awareness of
> > multiple windows the way it does of PASIDs.  It just sends
> > transactions over the bus with the IOVAs it's told.  If those IOVAs
> > lie within one of the windows, the IOMMU picks them up and translates
> > them.  If they don't, it doesn't.
> 
> To my mind that address centric routing is awareness.

I don't really understand that position.  A PASID capable device has
to be built to be PASID capable, and will generally have registers
into which you store PASIDs to use.

Any 64-bit DMA capable device can use the POWER IOMMU just fine - it's
up to the driver to program it with addresses that will be translated
(and in Linux the driver will get those from the DMA subsystem).

> If the HW can attach multiple non-overlapping IOAS's to the same
> device then the HW is routing to the correct IOAS by using the address
> bits. This is not much different from the prior discussion we had
> where we were thinking of the PASID as an 80 bit address

Ah... that might be a workable approach.  And it even helps me get my
head around multiple attachment which I was struggling with before.

So, the rule would be that you can attach multiple IOASes to a device,
as long as none of them overlap.  The non-overlapping could be because
each IOAS covers a disjoint address range, or it could be because
there's some attached information - such as a PASID - to disambiguate.

What remains a question is where the disambiguating information comes
from in each case: does it come from properties of the IOAS,
propertues of the device, or from extra parameters supplied at attach
time.  IIUC, the current draft suggests it always comes at attach time
for the PASID information.  Obviously the more consistency we can have
here the better.

I can also see an additional problem in implementation, once we start
looking at hot-adding devices to existing address spaces.  Suppose our
software (maybe qemu) wants to set up a single DMA view for a bunch of
devices, that has such a split window.  It can set up IOASes easily
enough for the two windows, then it needs to attach them.  Presumbly,
it attaches them one at a time, which means that each device (or
group) goes through an interim state where it's attached to one, but
not the other.  That can probably be achieved by using an extra IOMMU
domain (or the local equivalent) in the hardware for that interim
state.  However it means we have to repeatedly create and destroy that
extra domain for each device after the first we add, rather than
simply adding each device to the domain which has both windows.

[I think this doesn't arise on POWER when running under PowerVM.  That
 has no concept like IOMMU domains, and instead the mapping is always
 done per "partitionable endpoint" (PE), essentially a group.  That
 means it's just a question of whether we mirror mappings on both
 windows into a given PE or just those from one IOAS.  It's not an
 unreasonable extension/combination of existing hardware quirks to
 consider, though]

> The fact the PPC HW actually has multiple page table roots and those
> roots even have different page tables layouts while still connected to
> the same device suggests this is not even an unnatural modelling
> approach...
> 
> Jason  
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Attachment:
signature.asc

Description: PGP signature