Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface

Alex Williamson <alex.williamson@xxxxxxxxxx> · Mon, 29 Jul 2013 16:55:23 -0600

On Mon, 2013-07-29 at 17:03 -0400, Don Dutile wrote:
> On 07/29/2013 12:06 PM, Alex Williamson wrote:
> > On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote:
> >> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:
> >>> On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote:
> >>>> On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote:
> >>>>> On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote:
> >>>>>> On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
> >>>>>> <alex.williamson@xxxxxxxxxx>  wrote:
> >>>>>>> On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
> >>>>>>>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> >>>>>>    As the DMA
> >>>>>> transaction propagates through the fabric, it may be tagged by bridges
> >>>>>> with different requester IDs.
> >>>>>>
> >>>>>> The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
> >>>>>> not sure the intermediate pci_devs are.
> >>>>>
> >>>>> A u16 requester ID doesn't mean much on it's own though, it's not
> >>>>> necessarily even unique.  A requester ID associated with the context of
> >>>>> a pci_dev is unique and gives us a reference point if we need to perform
> >>>>> another operation on that requester ID.
> >>>>
> >>>> A u16 requester ID better mean something to an IOMMU -- it's all the
> >>>> IOMMU can use to look up the correct mapping.  That's why we have to
> >>>> give the iterator something to define the scope to iterate over.  The
> >>>> same requester ID could mean something totally unrelated in a
> >>>> different scope, e.g., below a different IOMMU.
> >>>
> >>> The point I'm trying to make is that a requester ID depends on it's
> >>> context (minimally, the PCI segment).  The caller can assume the context
> >>> based on the calling parameters or we can provide context in the form of
> >>> an associated pci_dev.  I chose the latter path because I prefer
> >>> explicit interfaces and it has some usefulness in the intel-iommu
> >>> implementation.
> >>>
> >>> For instance, get_domain_for_dev() first looks to see if a pci_dev
> >>> already has a domain.  If it doesn't, we want to look to see if there's
> >>> an upstream device that would use the same requester ID that already has
> >>> a domain.  If our get-requester-ID-for-device function returns only the
> >>> requester ID, we don't know if that requester ID is the device we're
> >>> searching from or some upstream device.  Therefore we potentially do an
> >>> unnecessary search for the domain.
> >>>
> >>> The other user is intel_iommu_add_device() where we're trying to build
> >>> IOMMU groups.  Visibility is the first requirement of an IOMMU group.
> >>> If the IOMMU cannot distinguish between devices, they must be part of
> >>> the same IOMMU group.  Here we want to find the pci_dev that hosts the
> >>> requester ID.  I don't even know how we'd implement this with a function
> >>> that only returned the requester ID.  Perhaps we'd have to walk upstream
> >>> from the device calling the get-requester-ID-for-device function at each
> >>> step and noticing when it changed.  That moves significant logic back
> >>> into the caller code.
> >>> ...
> >>
> >>>> I don't see the point of passing a "device closest to the requester
> >>>> ID."  What would the IOMMU do with that?  As far as the IOMMU is
> >>>> concerned, the requester ID could be an arbitrary number completely
> >>>> unrelated to a pci_dev.
> >>>
> >>> Do you have an example of a case where a requester ID doesn't have some
> >>> association to a pci_dev?
> >>
> >> I think our confusion here is the same as what Don&  I have been
> >> hashing out -- I'm saying a requester ID fabricated by a bridge
> >> need not correspond to a specific pci_dev, and you probably mean
> >> that every requester ID is by definition the result of *some* PCI
> >> device making a DMA request.
> >
> > Yes
> >
> >>> ...
> >>> Furthermore, if we have:
> >>>
> >>>       -- A
> >>>      /
> >>> X--Y
> >>>      \
> >>>       -- B
> >>> ...
> >>
> >>> Let me go back to the X-Y-A|B example above to see if I can explain why
> >>> pcie_for_each_requester_id() doesn't make sense to me.  Generally a
> >>> for_each_foo function should iterate across all things with the same
> >>> foo.  So a for_each_requester_id should iterate across all things with
> >>> the same requester ID.
> >>
> >> Hm, that's not the way I think of for_each_FOO() interfaces.  I
> >> think of it as "execute the body (or callback) for every possible
> >> FOO", where FOO is different each time.  for_each_pci_dev(),
> >> pci_bus_for_each_resource(), for_each_zone(), for_each_cpu(), etc.,
> >> work like that.
> >
> > Most of these aren't relevant because they iterate over everything.  I
> > think they only show that if we had a pci_for_each_requester_id() that
> > it should iterate over every possible PCI requester ID in the system and
> > the same could be said for pci_for_each_requester().
> >
> Lost me here.... I thought the functions took in a pdev, so it'll only
> iterate on the segment that pdev is associated with.

Right, but how does the callback function know the segment we're
traversing?  Without a pci_dev a requester ID could belong to any PCI
segment.  It would be up to the caller to provide context in the opaque
pointer.

> > pci_bus_for_each_resource() is the only one we can build from; for all
> > resources on a bus.  We want all requester IDs for a pci_dev.  Does that
> > perhaps mean it should be called pci_dev_for_each_requester_id()?  I'd
> > be ok with that name, but I think it even more implies that a pci_dev is
> > associated with a requester ID.
> >
> and the fcn call name changes have equally lost me here.
> AIUI, IOMMUs want to call a PCI core function with a pdev, asking for
> the req-id's that the pdev may generate to the IOMMU when performing DMA.
> that doesn't strike me as a for-each-requester-id() but a 'get-requester-id()'
> operation.

But we have to support multiple requester IDs per endpoint.  As Bjorn
pointed out in the spec, a PCI-X bridge may pass the requester ID or
take ownership of the transaction.  Therefore we need to add both the
"endpoint" and the (bridge->secondary->number << 8 | 0) requester IDs.
We could do a get-requester-id() interface, but then the caller must
walk upstream through each device that may possibly take ownership of
the transactions.  That leaves a lot of logic in the caller versus a
for-each-requester-id() operation.

> >> But the more important question is what arguments we give to the
> >> callback.  My proposal was to map
> >>
> >>    {pci_dev ->  {requester-ID-A, requester-ID-B, ...}}
> >>
> >> Yours is to map
> >>
> >>    {pci_dev ->  {{pci_dev-A, requester-ID-A}, {pci_dev-B, requester-ID-B}, ...}}
> >>
> >> i.e., your callback gets both a pci_dev and a requester-ID.  I'm
> >> trying to figure out how you handle cases where requester-id-A
> >> doesn't have a corresponding pci_dev-A.  I guess you pass the device
> >> most closely associated with requester-id-A
> >
> > Yes
> >
> >> The "most closely associated device" idea seems confusing to
> >> describe and think about.  I think you want to use it as part of
> >> grouping devices into domains.  But I think we should attack that
> >> problem separately.  For grouping or isolation questions, we have
> >> to pay attention to things like ACS, which are not relevant for
> >> mapping.
> >
> > We are only touch isolation insofar as providing an interface for a
> > driver to determine the point in the PCI topology where the requester ID
> > is rooted.  Yes, grouping can make use of that, but I object to the idea
> > that we're approaching some slippery slope of rolling multiple concepts
> > into this patch.  What I'm proposing here is strictly a requester ID
> > interface.  I believe that providing a way for a caller to determine
> > that two devices have a requester ID rooted at the same point in the PCI
> > topology is directly relevant to that interface.
> >
> I strongly agree here.  providing the pdev's associated with each req-id rtn'd
> cannot hurt, and in fact, I believe it is an improvement, avoiding a potential
> set of more interfaces that may be needed to do (segment, req-id)->pdev mapping/matching/get-ing.
> 
> 
> > pci_find_upstream_pcie_bridge() attempts to do this same thing.
> > Unfortunately, the interface is convoluted, it doesn't handle quirks,
> > and it requires a great deal of work by the caller to then walk the
> > topology and create requester IDs at each step.  This also indicates
> > that at each step, the requester ID is associated with some pci_dev.
> > Providing a pci_dev is simply showing our work and providing context to
> > the requester ID (ie. here's the requester ID and the step along the
> > path from which it was derived.  Here's your top level requester ID and
> > the point in the topology where it's based).
> >
> IMO, getting rid of pci_find_upstream_pcie_bridge() gets non-PCI code
> out of the biz of knowing too much about PCI topology and the idiosyncracies
> around req-id's, and the proposed interface cleans up the IOMMU (PCI) support.
> Having the IOMMU api get the pdev rtn'd with a req-id, and then passing it
> back to the core (for release/free) seems like the proper get/put logic
> btwn the PCI core and another kernel subsystem.
> 
> >>> If we look at an endpoint device like A, only A
> >>> has A's requester ID.  Therefore, why would for_each_requester_id(A)
> >>> traverse up to Y?
> >>
> >> Even if A is a PCIe device, we have to traverse upwards to find any
> >> bridges that might drop A's requester ID or take ownership, e.g., if
> >> we have this:
> >>
> >>    00:1c.0 PCIe-to-PCI bridge to [bus 01-02]
> >>    01:00.0 PCI-to-PCIe bridge to [bus 02]
> >>    02:00.0 PCIe endpoint A
> >>
> >> the IOMMU has to look for requester-ID 0100.
> >
> > And I believe this patch handles this case correctly; I mentioned this
> > exact example in the v2 RFC cover letter.  This is another example where
> > pci_find_upstream_pcie_bridge() will currently fail.
> >
> +1
> 
> >>> Y can take ownership and become the requester for A,
> >>> but if we were to call for_each_requester_id(Y), wouldn't you expect the
> >>> callback to happen on {Y, A, B} since all of those can use that
> >>> requester ID?
> >>
> >> No.  If I call for_each_requester_id(Y), I'm expecting the callback
> >> to happen for each requester ID that could be used for transactions
> >> originated by *Y*.  I'm trying to make an IOMMU mapping for use by
> >> Y, so any devices downstream of Y, e.g., A and B, are irrelevant.
> >
> > Ok, you think of for_each_requester_id() the same as I think of
> > for_each_requester().  Can we split the difference and call it
> > pci_dev_for_each_requester_id()?
> >
> I can only ditto my sentiment above.
> Can I toss in 'pci_get_requester_id()'?!? ... ;-)

I assume pci_get_requester_id() would return just the requester ID for a
given device.  The problem with that is that it doesn't have much
meaning.  What's the requester ID of a bridge?  It depends on whether
the bridge is mastering the transaction (bridge->bus->number << 8 |
bridge->devfn) or whether it's taking ownership of a transaction for a
subordinate device (bridge->secondary->number << 8 | 0).  So we really
want the for-each interface because then we have a set of devices and a
vector through them.

> >> I think a lot of the confusion here is because we're trying to solve
> >> both two questions at once: (1) what requester-IDs need to be mapped
> >> to handle DMA from a device, and (2) what devices can't be isolated
> >> from each other and must be in the same domain.  I think things will
> >> be simpler if we can deal with those separately.  Even if it ends up
> >> being more code, at least each piece will be easier to understand by
> >> itself.
> >
> > Don't we already have this split in the code?
> >
> > (1) pcie_for_each_requester
> >      (or pci_dev_for_each_requester_id)
> >
> > (2) pci_get_visible_pcie_requester
> >      (or pci_get_visible_pcie_requester_id)
> >
> again, unless you understand all of the mapped/coerced/modified request-id
> of PCIe, PCIe-to-PCI(x) bridges, and the potential (RC, other) quirks,
> these interface names are confusing. ....
> ... that could be me, and I need to go back to look at patches and
> the description of the functions, to see if they help to clarify their uses.

Perhaps it's time for a v3, but I'm not sure what all we've learned from
v2 yet or what direction a v3 should take.  We've learned that root
complex PCIe-to-PCI bridges use implementation specific requester IDs
and we know that Intel uses the (bus|devfn) requester ID instead of the
standard (secondary|0).  This should probably be handled as a quirk.
We've renamed pcie_for_each_requester() to
pci_dev_for_each_requester_id().  I don't know if we've renamed
pci_get_visible_pcie_requester() yet, I was just trying to be consistent
with naming by adding _id to the end.  Thanks,

Alex

> > Note however that (2) does not impose anything about domains or
> > isolation, it is strictly based on PCI topology.  It's left to the
> > caller to determine how that translates to IOMMU domains, but the
> > typical case is trivial.  Thanks,
> >
> +1, again.
> 
> > Alex
> >
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html