Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/29/2013 12:06 PM, Alex Williamson wrote:
On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote:
On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:
On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote:
On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote:
On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote:
On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
<alex.williamson@xxxxxxxxxx>  wrote:
On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
   As the DMA
transaction propagates through the fabric, it may be tagged by bridges
with different requester IDs.

The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
not sure the intermediate pci_devs are.

A u16 requester ID doesn't mean much on it's own though, it's not
necessarily even unique.  A requester ID associated with the context of
a pci_dev is unique and gives us a reference point if we need to perform
another operation on that requester ID.

A u16 requester ID better mean something to an IOMMU -- it's all the
IOMMU can use to look up the correct mapping.  That's why we have to
give the iterator something to define the scope to iterate over.  The
same requester ID could mean something totally unrelated in a
different scope, e.g., below a different IOMMU.

The point I'm trying to make is that a requester ID depends on it's
context (minimally, the PCI segment).  The caller can assume the context
based on the calling parameters or we can provide context in the form of
an associated pci_dev.  I chose the latter path because I prefer
explicit interfaces and it has some usefulness in the intel-iommu
implementation.

For instance, get_domain_for_dev() first looks to see if a pci_dev
already has a domain.  If it doesn't, we want to look to see if there's
an upstream device that would use the same requester ID that already has
a domain.  If our get-requester-ID-for-device function returns only the
requester ID, we don't know if that requester ID is the device we're
searching from or some upstream device.  Therefore we potentially do an
unnecessary search for the domain.

The other user is intel_iommu_add_device() where we're trying to build
IOMMU groups.  Visibility is the first requirement of an IOMMU group.
If the IOMMU cannot distinguish between devices, they must be part of
the same IOMMU group.  Here we want to find the pci_dev that hosts the
requester ID.  I don't even know how we'd implement this with a function
that only returned the requester ID.  Perhaps we'd have to walk upstream
from the device calling the get-requester-ID-for-device function at each
step and noticing when it changed.  That moves significant logic back
into the caller code.
...

I don't see the point of passing a "device closest to the requester
ID."  What would the IOMMU do with that?  As far as the IOMMU is
concerned, the requester ID could be an arbitrary number completely
unrelated to a pci_dev.

Do you have an example of a case where a requester ID doesn't have some
association to a pci_dev?

I think our confusion here is the same as what Don&  I have been
hashing out -- I'm saying a requester ID fabricated by a bridge
need not correspond to a specific pci_dev, and you probably mean
that every requester ID is by definition the result of *some* PCI
device making a DMA request.

Yes

...
Furthermore, if we have:

      -- A
     /
X--Y
     \
      -- B
...

Let me go back to the X-Y-A|B example above to see if I can explain why
pcie_for_each_requester_id() doesn't make sense to me.  Generally a
for_each_foo function should iterate across all things with the same
foo.  So a for_each_requester_id should iterate across all things with
the same requester ID.

Hm, that's not the way I think of for_each_FOO() interfaces.  I
think of it as "execute the body (or callback) for every possible
FOO", where FOO is different each time.  for_each_pci_dev(),
pci_bus_for_each_resource(), for_each_zone(), for_each_cpu(), etc.,
work like that.

Most of these aren't relevant because they iterate over everything.  I
think they only show that if we had a pci_for_each_requester_id() that
it should iterate over every possible PCI requester ID in the system and
the same could be said for pci_for_each_requester().

Lost me here.... I thought the functions took in a pdev, so it'll only
iterate on the segment that pdev is associated with.

pci_bus_for_each_resource() is the only one we can build from; for all
resources on a bus.  We want all requester IDs for a pci_dev.  Does that
perhaps mean it should be called pci_dev_for_each_requester_id()?  I'd
be ok with that name, but I think it even more implies that a pci_dev is
associated with a requester ID.

and the fcn call name changes have equally lost me here.
AIUI, IOMMUs want to call a PCI core function with a pdev, asking for
the req-id's that the pdev may generate to the IOMMU when performing DMA.
that doesn't strike me as a for-each-requester-id() but a 'get-requester-id()'
operation.

But the more important question is what arguments we give to the
callback.  My proposal was to map

   {pci_dev ->  {requester-ID-A, requester-ID-B, ...}}

Yours is to map

   {pci_dev ->  {{pci_dev-A, requester-ID-A}, {pci_dev-B, requester-ID-B}, ...}}

i.e., your callback gets both a pci_dev and a requester-ID.  I'm
trying to figure out how you handle cases where requester-id-A
doesn't have a corresponding pci_dev-A.  I guess you pass the device
most closely associated with requester-id-A

Yes

The "most closely associated device" idea seems confusing to
describe and think about.  I think you want to use it as part of
grouping devices into domains.  But I think we should attack that
problem separately.  For grouping or isolation questions, we have
to pay attention to things like ACS, which are not relevant for
mapping.

We are only touch isolation insofar as providing an interface for a
driver to determine the point in the PCI topology where the requester ID
is rooted.  Yes, grouping can make use of that, but I object to the idea
that we're approaching some slippery slope of rolling multiple concepts
into this patch.  What I'm proposing here is strictly a requester ID
interface.  I believe that providing a way for a caller to determine
that two devices have a requester ID rooted at the same point in the PCI
topology is directly relevant to that interface.

I strongly agree here.  providing the pdev's associated with each req-id rtn'd
cannot hurt, and in fact, I believe it is an improvement, avoiding a potential
set of more interfaces that may be needed to do (segment, req-id)->pdev mapping/matching/get-ing.


pci_find_upstream_pcie_bridge() attempts to do this same thing.
Unfortunately, the interface is convoluted, it doesn't handle quirks,
and it requires a great deal of work by the caller to then walk the
topology and create requester IDs at each step.  This also indicates
that at each step, the requester ID is associated with some pci_dev.
Providing a pci_dev is simply showing our work and providing context to
the requester ID (ie. here's the requester ID and the step along the
path from which it was derived.  Here's your top level requester ID and
the point in the topology where it's based).

IMO, getting rid of pci_find_upstream_pcie_bridge() gets non-PCI code
out of the biz of knowing too much about PCI topology and the idiosyncracies
around req-id's, and the proposed interface cleans up the IOMMU (PCI) support.
Having the IOMMU api get the pdev rtn'd with a req-id, and then passing it
back to the core (for release/free) seems like the proper get/put logic
btwn the PCI core and another kernel subsystem.

If we look at an endpoint device like A, only A
has A's requester ID.  Therefore, why would for_each_requester_id(A)
traverse up to Y?

Even if A is a PCIe device, we have to traverse upwards to find any
bridges that might drop A's requester ID or take ownership, e.g., if
we have this:

   00:1c.0 PCIe-to-PCI bridge to [bus 01-02]
   01:00.0 PCI-to-PCIe bridge to [bus 02]
   02:00.0 PCIe endpoint A

the IOMMU has to look for requester-ID 0100.

And I believe this patch handles this case correctly; I mentioned this
exact example in the v2 RFC cover letter.  This is another example where
pci_find_upstream_pcie_bridge() will currently fail.

+1

Y can take ownership and become the requester for A,
but if we were to call for_each_requester_id(Y), wouldn't you expect the
callback to happen on {Y, A, B} since all of those can use that
requester ID?

No.  If I call for_each_requester_id(Y), I'm expecting the callback
to happen for each requester ID that could be used for transactions
originated by *Y*.  I'm trying to make an IOMMU mapping for use by
Y, so any devices downstream of Y, e.g., A and B, are irrelevant.

Ok, you think of for_each_requester_id() the same as I think of
for_each_requester().  Can we split the difference and call it
pci_dev_for_each_requester_id()?

I can only ditto my sentiment above.
Can I toss in 'pci_get_requester_id()'?!? ... ;-)

I think a lot of the confusion here is because we're trying to solve
both two questions at once: (1) what requester-IDs need to be mapped
to handle DMA from a device, and (2) what devices can't be isolated
from each other and must be in the same domain.  I think things will
be simpler if we can deal with those separately.  Even if it ends up
being more code, at least each piece will be easier to understand by
itself.

Don't we already have this split in the code?

(1) pcie_for_each_requester
     (or pci_dev_for_each_requester_id)

(2) pci_get_visible_pcie_requester
     (or pci_get_visible_pcie_requester_id)

again, unless you understand all of the mapped/coerced/modified request-id
of PCIe, PCIe-to-PCI(x) bridges, and the potential (RC, other) quirks,
these interface names are confusing. ....
... that could be me, and I need to go back to look at patches and
the description of the functions, to see if they help to clarify their uses.

Note however that (2) does not impose anything about domains or
isolation, it is strictly based on PCI topology.  It's left to the
caller to determine how that translates to IOMMU domains, but the
typical case is trivial.  Thanks,

+1, again.

Alex


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux