On Wed, Aug 15, 2012 at 5:30 AM, Thierry Reding <thierry.reding@xxxxxxxxxxxxxxxxx> wrote: > On Wed, Aug 15, 2012 at 05:18:04AM -0700, Bjorn Helgaas wrote: >> On Tue, Aug 14, 2012 at 11:37 PM, Thierry Reding >> <thierry.reding@xxxxxxxxxxxxxxxxx> wrote: >> > On Tue, Aug 14, 2012 at 04:50:26PM -0700, Bjorn Helgaas wrote: >> >> On Tue, Aug 14, 2012 at 1:12 PM, Thierry Reding >> >> <thierry.reding@xxxxxxxxxxxxxxxxx> wrote: >> >> > On Thu, Jul 26, 2012 at 09:55:12PM +0200, Thierry Reding wrote: >> >> >> diff --git a/arch/arm/boot/dts/tegra20.dtsi b/arch/arm/boot/dts/tegra20.dtsi >> >> >> index a094c97..c886dff 100644 >> >> >> --- a/arch/arm/boot/dts/tegra20.dtsi >> >> >> +++ b/arch/arm/boot/dts/tegra20.dtsi >> >> >> @@ -199,6 +199,68 @@ >> >> >> #size-cells = <0>; >> >> >> }; >> >> >> >> >> >> + pcie-controller { >> >> >> + compatible = "nvidia,tegra20-pcie"; >> >> >> + reg = <0x80003000 0x00000800 /* PADS registers */ >> >> >> + 0x80003800 0x00000200 /* AFI registers */ >> >> >> + 0x81000000 0x01000000 /* configuration space */ >> >> >> + 0x90000000 0x10000000>; /* extended configuration space */ >> >> >> + interrupts = <0 98 0x04 /* controller interrupt */ >> >> >> + 0 99 0x04>; /* MSI interrupt */ >> >> >> + status = "disabled"; >> >> >> + >> >> >> + ranges = <0 0 0 0x80000000 0x00001000 /* root port 0 */ >> >> >> + 0 1 0 0x81000000 0x00800000 /* port 0 config space */ >> >> >> + 0 2 0 0x90000000 0x08000000 /* port 0 ext config space */ >> >> >> + 0 3 0 0x82000000 0x00010000 /* port 0 downstream I/O */ >> >> >> + 0 4 0 0xa0000000 0x08000000 /* port 0 non-prefetchable memory */ >> >> >> + 0 5 0 0xb0000000 0x08000000 /* port 0 prefetchable memory */ >> >> >> + >> >> >> + 1 0 0 0x80001000 0x00001000 /* root port 1 */ >> >> >> + 1 1 0 0x81800000 0x00800000 /* port 1 config space */ >> >> >> + 1 2 0 0x98000000 0x08000000 /* port 1 ext config space */ >> >> >> + 1 3 0 0x82010000 0x00010000 /* port 1 downstream I/O */ >> >> >> + 1 4 0 0xa8000000 0x08000000 /* port 1 non-prefetchable memory */ >> >> >> + 1 5 0 0xb8000000 0x08000000>; /* port 1 prefetchable memory */ >> >> > >> >> > I've been thinking about this some more. The translations for both the >> >> > regular and extended configuration spaces are configured in the top- >> >> > level PCIe controller. It is therefore wrong how they are passed to the >> >> > PCI host bridges via the ranges property. >> >> > >> >> > I remember Mitch saying that it should be passed down to the children >> >> > because it is partitioned among them, but since the layout is compatible >> >> > with ECAM, the partitioning isn't as simple as what's in the tree. In >> >> > fact the partitions will be dependent on the number of devices attached >> >> > to the host bridges. >> >> >> >> I don't understand this last bit about the number of devices attached >> >> to the host bridges. Logically, the host bridge has a bus number >> >> aperture that you can know up front, even before you know anything >> >> about what devices are below it. On x86, for example, the ACPI _CRS >> >> method has something like "[bus 00-7f]" in it, which means that any >> >> buses in that range are below this bridge. That doesn't tell us >> >> anything about which buses actually have devices on them, of course; >> >> it's just analogous to the secondary and subordinate bus number >> >> registers in a P2P bridge. >> > >> > That's one of the issues I still need to take care of. Currently no bus >> > resource is attached to the individual bridges (nor the PCI controller >> > for that matter), so the PCI core will assign them dynamically. >> >> So your PCI controller driver knows how to program the controller bus >> number aperture? Sometimes people start by assuming that two host >> bridges both have [bus 00-ff] apertures, then they enumerate below the >> first and adjust the bus number apertures based on what they found. >> For example, if they found buses 00-12 behind the first bridge, they >> make the apertures [bus 00-12] for the first bridge and [bus 13-ff] >> for the second. That might be the case, depending on what firmware >> set up, but it seems like a dubious way to do it, and of course it >> precludes a lot of hot-plug scenarios. > > No, that's not what I meant. What happens is that no pre-assigned bus > range is specified for either of the host bridges, so that the range > 0x00-0xff will be assigned by default in pci_scan_root_bus(). My concern is about making the kernel's idea of the host bridge bus number aperture match what the hardware is doing. I'm pretty sure that the default [bus 00-ff] range assigned by pci_scan_root_bus() doesn't actually match the hardware in most cases, at least when we have multiple host bridges in the same PCI domain. For example, if you don't supply a bus number range, pci_scan_root_bus() will assume [bus 00-ff] for both host bridges. But if you could put an analyzer on each of the root buses and then read bus 0 config space, will you see that config transaction on *both* buses? I doubt it. You have to know at least the bus number of the root bus up front before you can even start enumerating it. The only way to learn that is by reading registers in the host bridge or by some external mechanism like ACPI or device tree. That's the beginning of the bus number aperture. The end of the aperture is similar: we can't reliably determine it by enumerating devices below the host bridge, so we have to know it up front. You can enumerate starting with the root bus number and assigning new subordinate bus numbers as necessary, but unless you know the host bridge aperture to begin with, you could inadvertently assign a new bus number that actually belongs to a different host bridge. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html