On Wed, Feb 17, 2016 at 8:58 PM, Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > On Wed, 17 Feb 2016 17:15:09 +0530 > Jayachandran Chandrashekaran Nair > <jayachandran.chandrashekaran@xxxxxxxxxxxx> wrote: > >> On Wed, Feb 17, 2016 at 3:55 AM, Alex Williamson >> <alex.williamson@xxxxxxxxxx> wrote: >> > On Wed, 17 Feb 2016 02:38:24 +0530 >> > Jayachandran Chandrashekaran Nair >> > <jayachandran.chandrashekaran@xxxxxxxxxxxx> wrote: >> > >> >> On Tue, Feb 16, 2016 at 1:09 AM, Alex Williamson >> >> <alex.williamson@xxxxxxxxxx> wrote: >> >> > On Mon, 15 Feb 2016 12:20:23 -0600 >> >> > Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: >> >> > >> >> >> [+cc Alex, iommu list] >> >> >> >> >> >> On Mon, Feb 15, 2016 at 03:35:00AM +0530, Jayachandran C wrote: >> >> >> > Add a new flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS to indicate bridges >> >> >> > that should not be considered during DMA alias search. This is >> >> >> > to support hardware (in this case Broadcom Vulcan PCIe subsystem) >> >> >> > that has internal bridges which have either missing or wrong PCIe >> >> >> > capabilities. >> >> > >> >> > I figured this would come at some point, the right answer is of course >> >> > to follow the PCIe spec and implement the required PCIe capability in >> >> > the hardware. >> >> >> >> There are PCIe controllers on the chip that follows the spec, the issue is >> >> how it is integrated to the northbridge (equivalent) on the chip, I have >> >> tried to explain it below. >> >> >> >> >> This needs more explanation, like what exactly is wrong with this >> >> >> device? A missing PCIe capability might cause other problems. >> >> >> >> >> >> What problem does this fix? Without these patches, do we merely add >> >> >> aliases that are unnecessary? Do we crash because something goes >> >> >> wrong in the pci_pcie_type() switch because of the incorrect >> >> >> capability? >> >> >> >> Here's how (for example) how the PCI enumeration of a 2 node Vulcan >> >> processor will look like: >> >> >> >> >> >> [0] +-0.0.0--[1]---+--1.a.0----[2]-----2.0.0---[3]----3.0.0 >> >> | +--1.a.1----[4]-----4.0.0---[5]----5.0.0 >> >> | . >> >> | ... etc... >> >> | >> >> +-0.0.1--[10]--+-10.a.0----[11]---11.0.0---[12]---12.0.0 >> >> +-10.a.1----[13]---13.0.0---[14]---14.0.0 >> >> . >> >> ... etc... >> > >> > So we have: >> > >> > "Glue Bridge"---"Glue Bridges"---Root Ports---Endpoints >> > (no pcie) (pci/x-pcie) >> >> The top level is one glue bridge per chip in a multi-chip board, >> but otherwise this is accurate. >> >> >> >> >> The devices 0.0.x and x.a.x are glue bridges that are there to >> >> bring the real PCIe controllers (pcie cap type 4) 2.0.0, 4.0.0, >> >> 11.0.0, 13.0.0 under a single PCI hierarchy. The x.a.x bridges >> >> have a PCIe capability (type 8) and 0.0.x does not have any pcie >> >> capability. >> >> >> >> In case of Vulcan, both the GICv3 ITS driver code and the SMMUv3 >> >> driver code does dma alias walk to find the device id to use >> >> in ITS and SMMU. In both cases it will ignore the x.0.0 bridges >> >> since they are type root port, but will continue on up and end >> >> up with incorrect device id. >> >> >> >> The flag I have added is to make the pci_for_each_dma_alias() >> >> ignore the last 2 levels of glue/internal bridges. >> > >> > Per the PCIe spec, I'm not even sure what you have is a valid >> > hierarchy, root ports sit on root complexes, not behind a PCI-to-PCIe >> > bridge. So really you're pretending that the downstream "glue bridge" >> > is your host bridge and therefore root complex, but the PCI topology >> > would clearly dictate that the top-most bus is conventional PCI and >> > therefore everything is an alias of everything else and the DMA alias >> > code is doing exactly what it should. >> >> Yes, I am not arguing that there is any issue in the current code. I >> am trying to figure out the correct way to implement the quirk. We >> have to support the hardware we have, not the hardware we want to >> have :) >> >> > Also note that the caller of pci_for_each_dma_alias() owns the callback >> > function triggered for each alias, that function could choose to prune >> > out these "glue bridges" itself if that's the appropriate thing to do. >> >> I had implemented this first, but moved to the current approach because >> it is needed in multiple places. The problem is: "On vulcan, while >> going up the pcie bus heirarchy for finding aliases, skip the glue bridges", >> So the logical place to put the fix is in pci_for_each_dma_alias() >> >> > Where do the SMMU and ITS actually reside in the above diagram? You >> > can use the callout function to stop the traversal anywhere you wish, >> > it's free to return a -errno to stop or positive number, which the >> > caller can interpret as error or non-failure stop condition. >> >> The SMMU (for translation requests) and ITS (for MSI writes for >> interrupts) are connected directly to the proper PCIe controller >> (2/4/11/13.0.0 above) > > > If the translation unit is on the root port then DMA aliases upstream of > the translation unit are irrelevant and the callback function should > stop the traversal at that point. > >> > You could even think about changing what Linux considers to be the host >> > bridge. Maybe these glue bridges shouldn't even be visible to Linux, >> > would that also solve the issue with the broken BAR? >> >> The glue bridges have to seen by Linux for assigning resources like >> bus ranges and memory windows. So these are required bridges in >> the topology and will work with the standard linux code >> (provided we skip them for aliases, and and ignore the BAR). >> >> >> > The change takes the same code path as it would for a real PCIe bridge >> >> > port (downstream/upstream/root), which means they want to skip adding >> >> > this bridge as an alias of the device. So we're adding in aliases that >> >> > don't exist, the bridge itself. >> >> > >> >> > If anything I'd suggest a flag that actually tries to address the >> >> > problem rather than a symptom of the problem. For example, maybe the >> >> > flag should be PCI_DEV_FLAGS_IS_PCIE. Maybe pci_is_pcie() should even >> >> > take that into account. That has some trickle through for >> >> > pci_pcie_type() and all the accessor functions, but maybe it's a >> >> > cleaner solution overall (or maybe it explodes further). Thanks, >> >> >> >> I didn't really want to mark the glue bridges as PCIe or have fake >> >> PCIe capability there, the obvious simple solution was to add >> >> the flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS >> >> >> >> Any suggestions/comments on how to do this better would be welcome. >> > >> > I definitely don't think either flag idea is the right solution, I >> > think you've actually already got the tools you need to solve it by >> > putting the intelligence in the callback function or by going further >> > down the path of pretending the glue bridge is really a host bridge. >> >> If I understand your suggestion, I need to fake the PCIe capability >> for the glue bridges, which may be reasonable for the second level. >> But for the first level, there is no pcie capability and faking that >> becomes more difficult. > > That's not what I'm suggesting. > >> If the concern is adding a flag for just one platform, I can check >> for the vendor ID/device ID in the pci_for_each_dma_alias() >> function, which would be slightly uglier, but still a simple and >> maintainable solution. >> >> Another option is to break the pci_for_each_dma_alias when it >> reaches to root port (for vulcan), this too can be done without >> a flag. > > Exactly, this can be done by the callback function provided on vulcan > systems and requires no core changes. The IOMMU needs to provide a > callback function that makes sense for collecting aliases on the > specific platform. If the translation unit is not on the root bus, the > callback function should stop the topology walk earlier. Thanks, Fixing every callback function is going to be painful and ugly. I will have to change drivers/irqchip/irq-gic-v3-its-pci-msi.c, drivers/pci/msi.c, and drivers/iommu/arm-smmu-v3.c with the same check for vendor/device ids and return early. I can update the patch to check the vendor/device id in the case of ROOT_PORT in pci_for_each_dma_alias(). If you think that will not be acceptable, I don't have any reasonable options left. Thanks, JC. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html