Re: [PATCH v4 3/6] PCI: Add support for multiple DMA aliases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 25, 2016 at 03:41:51PM +0000, Lawrynowicz, Jacek wrote:
> > -----Original Message-----
> > From: Bjorn Helgaas [mailto:helgaas@xxxxxxxxxx]
> > Sent: Thursday, February 25, 2016 3:39 PM
> > To: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > Cc: Lawrynowicz, Jacek <jacek.lawrynowicz@xxxxxxxxx>; linux-
> > pci@xxxxxxxxxxxxxxx; Alex Williamson <alex.williamson@xxxxxxxxxx>; Joerg
> > Roedel <jroedel@xxxxxxx>; David Woodhouse <dwmw2@xxxxxxxxxxxxx>;
> > iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > Subject: Re: [PATCH v4 3/6] PCI: Add support for multiple DMA aliases
> > 
> > On Wed, Feb 24, 2016 at 01:44:06PM -0600, Bjorn Helgaas wrote:
> > > From: Jacek Lawrynowicz <jacek.lawrynowicz@xxxxxxxxx>
> > >
> > > <Insert changelog here>
> > 
> > (Sorry, I should have copied this changelog in the patch; I copied
> > this manually from your v3 posting):
> > 
> > > This patch solves IOMMU support issues with PCIe non-transparent bridges
> > > that use Requester ID look-up tables (LUT), e.g. PEX8733. Before exiting
> > > the bridge, packet's RID is rewritten according to LUT programmed by
> > > a driver. Modified packets are then passed to a destination bus and
> > > processed upstream. The problem is that such packets seem to come from
> > > non-existent nodes that are hidden behind NTB and are not discoverable
> > > by a destination node, so IOMMU discards them. Adding DMA alias for a
> > > given LUT entry allows IOMMU to create a proper mapping that enables
> > > inter-node communication.
> > 
> > A specific example here would help me understand.  Here's how I
> > understand this (correct me if I'm wrong): We're talking about a DMA
> > packet being forwarded upstream from an NTB.  The NTB uses the LUT to
> > rewrite the RID in the DMA packet.  The new RID from the LUT is
> > unknown to the IOMMU, so it discards the DMA packet.
> 
> Yes, this is exactly the problem.
> 
> > > The current DMA alias implementation supports only single alias, so it's
> > > not possible to connect more than two nodes when IOMMU is enabled. This
> > > implementation enables all possible aliases on a given bus (256) that
> > > are stored in a bitset. Alias devfn is directly translated to a bit
> > > number. The bitset is not allocated for devices that have no need for
> > > DMA aliases.
> > 
> > I think "two nodes" is referring to two PCIe devices on the other side
> > of the NTB.  You want DMA packets from those devices to have different
> > RIDs so the IOMMU can distinguish them.
> 
> Right.
> 
> > The LUT entries basically create aliases of the NTB (one alias for
> > each device beyond the NTB).  Your quirk uses pci_add_dma_alias(), and
> > the aliases are all on the same bus as the NTB itself.
> > 
> > The quirk adds PCI_DEVFN(0x10, 0x0), PCI_DEVFN(0x11, 0x0), and
> > PCI_DEVFN(0x12, 0x0).  Shouldn't there be some connection between this
> > and the LUT programming?  I assume the LUT is programmed to correspond
> > to those aliases.  Does this mean you're limited to three devices
> > beyond the NTB?
> 
> Yes, there is an indirect connection between LUT table and devfns used in the
> quirk.
> Dev part is an offset in the LUT table and function is taken from the device
> behind the NTB.
> So the driver can only change the dev part by using different LUT offsets.
> We don't plan to modify this quirk. The number of PCIe devices beyond single
> x200 card NTB will not change.
> Two are used by x200 CPU (host bridge & root port) and one is used by x200 DMA
> engine.
> I'm not sure introducing some dependencies to make sure the offsets are set
> correctly is really worth it.

I'd like at least a comment that points to the specific x200 code that
must coordinate with this.

> So regarding the improvements in the patch description, you want me to update
> and repost it?

Yes, please.

> BTW I posted x200 DMA driver (the client for this change) on DMA list:
> https://lkml.org/lkml/2016/2/9/287
> I'm working on integrating review comments and hope to get it included in 4.6.

What about my questions on the code itself, below?

> > > ---
> > >  drivers/iommu/iommu.c |   17 ++++++++++-------
> > >  drivers/pci/pci.c     |   11 +++++++++--
> > >  drivers/pci/probe.c   |    1 +
> > >  drivers/pci/search.c  |   14 +++++++++-----
> > >  include/linux/pci.h   |    4 +---
> > >  5 files changed, 30 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > index 0e3b009..a214e19 100644
> > > --- a/drivers/iommu/iommu.c
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -659,9 +659,15 @@ static struct iommu_group
> > *get_pci_function_alias_group(struct pci_dev *pdev,
> > >  	return NULL;
> > >  }
> > >
> > > +static bool dma_alias_is_enabled(struct pci_dev *dev, u8 devfn)
> > > +{
> > > +	return dev->dma_alias_mask &&
> > > +	       test_bit(devfn, dev->dma_alias_mask);
> > > +}
> > > +
> > >  /*
> > > - * Look for aliases to or from the given device for exisiting groups.  The
> > > - * dma_alias_devfn only supports aliases on the same bus, therefore the
> > search
> > > + * Look for aliases to or from the given device for existing groups. DMA
> > > + * aliases are only supported on the same bus, therefore the search
> > 
> > I'm trying to reconcile this statement that "DMA aliases are only
> > supported on the same bus" (which was there even before this patch)
> > with the fact that pci_for_each_dma_alias() does not have that
> > limitation.
> > 
> > >   * space is quite small (especially since we're really only looking at pcie
> > >   * device, and therefore only expect multiple slots on the root complex or
> > >   * downstream switch ports).  It's conceivable though that a pair of
> > > @@ -686,11 +692,8 @@ static struct iommu_group *get_pci_alias_group(struct
> > pci_dev *pdev,
> > >  			continue;
> > >
> > >  		/* We alias them or they alias us */
> > > -		if (((pdev->dev_flags & PCI_DEV_FLAGS_DMA_ALIAS_DEVFN)
> > &&
> > > -		     pdev->dma_alias_devfn == tmp->devfn) ||
> > > -		    ((tmp->dev_flags & PCI_DEV_FLAGS_DMA_ALIAS_DEVFN) &&
> > > -		     tmp->dma_alias_devfn == pdev->devfn)) {
> > > -
> > > +		if (dma_alias_is_enabled(pdev, tmp->devfn) ||
> > > +		    dma_alias_is_enabled(tmp, pdev->devfn)) {
> > >  			group = get_pci_alias_group(tmp, devfns);
> > 
> > We basically have this:
> > 
> >   for_each_pci_dev(tmp) {
> >     if (<pdev and tmp are DMA aliases>)
> >       group = get_pci_alias_group();
> >       ...
> >   }
> > 
> > The DMA alias stuff relies on PCI internals, so it doesn't doesn't
> > seem quite right to use things like PCI_DEV_FLAGS_DMA_ALIAS_DEVFN and
> > dma_alias_devfn here in the IOMMU code.
> > 
> > I'm trying to figure out why we don't do something like the following
> > instead:
> > 
> >   callback(struct pci_dev *pdev, u16 alias, void *opaque)
> >   {
> >     struct iommu_group *group;
> > 
> >     group = get_pci_alias_group();
> >     if (group)
> >       return group;
> > 
> >     return 0;
> >   }
> > 
> >   pci_for_each_dma_alias(pdev, callback, ...);
> > 
> > Is the existing code some sort of optimization, e.g., checking
> > PCI_DEV_FLAGS_DMA_ALIAS_DEVFN is cheaper than using
> > pci_for_each_dma_alias()?
> > 
> > It seems like this won't work for some very unlikely but theoretically
> > possible topologies, e.g.,
> > 
> >   PCIe Root Complex/IOMMU
> >     PCIe switch A
> >       PCIe to conventional PCI bridge
> >         PCI to PCIe Root Complex
> > 	  PCIe NTB
> > 
> > Here, I think the IOMMU will only see RIDs from PCIe switch A, but the
> > current code only looks at DMA aliases that are on the same bus as the
> > PCIe NTB.  Wouldn't using pci_for_each_dma_alias() handle this
> > correctly?
> > 
> > >  			if (group) {
> > >  				pci_dev_put(tmp);


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux