On Fri, Apr 14, 2017 at 4:06 PM, Jayachandran C <jnair@xxxxxxxxxxxxxxxxxx> wrote: > On Thu, Apr 13, 2017 at 07:19:11PM -0500, Bjorn Helgaas wrote: >> I tentatively applied both patches to pci/host-thunder for v4.12. >> >> However, I am concerned about the topology here: >> >> On Thu, Apr 13, 2017 at 08:30:45PM +0000, Jayachandran C wrote: >> > On Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the >> > PCI topology is slightly unusual. For a multi-node system, it looks >> > like: >> > >> > 00:00.0 [PCI] bridge to [bus 01-1e] >> > 01:0a.0 [PCI-PCIe bridge, type 8] bridge to [bus 02-04] >> > 02:00.0 [PCIe root port, type 4] bridge to [bus 03-04] (XLATE_ROOT) >> > 03:00.0 PCIe Endpoint >> >> A root port normally has a single PCIe link leading downstream. >> According to this, 02:00.0 is a root port that has the usual >> downstream link leading to 03:00.0, but it also has an upstream link >> to 01:0a.0. > > The PCI topology is a bit broken due to the way that the PCIe IP block > was integrated into SoC PCI bridges and devices. The current mechanism > of adding a PCI-PCIe bridge to glue these together is not ideal. Yeah, that's definitely broken. >> Maybe this example is omitting details that are not relevant to DMA >> aliases? The PCIe capability only contains one set of link-related >> registers, so I don't know how we could manage a single device that >> has two links. > > The root port is standard and has just one link to the EP (or whatever > is on the external PCIe slot). The fallout of the current hw design is > that the PCI-PCIe bridge has a link that does not follow standard and > does not have a counterpart (as you noted). > >> A device with two links would break things like ASPM. In >> set_pcie_port_type(), for example, we have this comment: >> >> * A Root Port or a PCI-to-PCIe bridge is always the upstream end >> * of a Link. No PCIe component has two Links. Two Links are >> * connected by a Switch that has a Port on each Link and internal >> * logic to connect the two Ports. >> >> The topology above breaks these assumptions, which will make >> pdev->has_secondary_link incorrect, which means ASPM won't work >> correctly. > > Given the current hardware, the pcieport driver seems to work reasonably > for the root port at 02:00.0, with AER support. I will take a look at the > ASPM part. I don't think pcieport itself cares much about links. ASPM does, but it looks like set_pcie_port_type() actually is smart enough to know that PCI-to-PCIe bridges and Root Ports always have links on their secondary sides. So has_secondary_link probably does get set correctly. But I think you'll hit the VIA "strange chipset" thing in pcie_aspm_init_link_state(), which will probably prevent us from doing ASPM on the link from 02:00.0 to 03:00.0. Could you collect "lspci -vv" output from this system? I'd like to archive that as background for this IOMMU issue and the ASPM tweaks I suspect we'll have to do. I *wish* we had more information about that VIA thing, because I suspect we could get rid of it if we had more details. Bjorn