On Thu, Jan 31, 2013 at 05:34:36PM -0700, Stephen Warren wrote: > On 01/28/2013 11:56 AM, Thomas Petazzoni wrote: > > This driver implements the support for the PCIe interfaces on the > > Marvell Armada 370/XP ARM SoCs. In the future, it might be extended to > > cover earlier families of Marvell SoCs, such as Dove, Orion and > > Kirkwood. > > Bjorn and I happen to live very close, so we got together today and > talked about PCIe on ARM. > > One of the questions he asked is: why does the window management on the > Marvell SoCs need to be dynamic? > (Sorry if this was covered earlier; I vaguely recall some discussion on > the topic, but couldn't find it quickly) Well I've answered it several times, so has Thomas.. Lets try again, please save for future reference :) Lets seperate two things. The CPU physical address ranges reserved for PCI bus access are not dynamic. This is set in DT, or whatever, statically. Just like every other PCI case on ARM. Just like Tegra. That is not the issue. What is required is that the division of this space amongst the 10 physical PCI-E links must be dynamic. Just like x86. Just like the new tegra driver. [1] Is that clear? > As background, PCIe enumeration in Linux usually works like: > > 1) You start off with some CPU physical address regions that generate > transactions on the PCIe bus. > > 2) You enumerate all the PCIe devices, and assign an address to each BAR > found, carved out of the PCIe address range corresponding to the regions > you knew from (1). Step 2 also includes 'assign address windows to all the physical PCI-E links'. This is very important because it is what this entire discussion is about. Look at how tegra or x86 works, the CPU physical addresses for PCI-E do nothing until the PCI-to-PCI bridge window registers in each link's configuration space are setup. Until that is done the SOC doesn't know which link to send the transaction to. Marvell is the same, until the link's window registers are setup the CPU addresses don't go anywhere. Notice this has absolutely no effect on the host bridge aperture. This is a link-by-link configuration of what addresses *go down that link*. The big difference is the link window registers for Marvell do not conform to the PCI configuration space specification. They are Marvell specific. This is what the glue code in the host driver does, it converts the Marvell specificness into something the kernel can undertstand and control. There are countless ways to do this, but please accept it is necessary that it be done... > Now, I recall that a related issue was that you are tight on CPU > physical address space, and the second algorithm above would allow the > size of the PCIe controller's window configuration to be as small as > possible, and hence there would be more CPU physical address space > available to fit in other peripherals. Physical address space is certainly a concern, but availability of decoder windows is the major one. Each link requires one decoder window for MMIO and one for IO, and possibly one for prefetch. The chip doesn't have 30 decoder windows. So the allocation of decoders to links must be dynamic, based on the requirements of the downstream endports on the link. > However, why does this need to be dynamic? On a particular board, you > know all the other (non-PCIe) peripherals that you need to fit into the > CPU physical address space, so you know how much is left over for PCIe, > so why not always make the PCIe window fill up all the available > space, Because there is no such thing as an all-links PCIe window on this hardware. Each link has a seperate window. If you get rid of all the dynamic allocation then every link must statically reserve some portion of physical address space and some number of decoder windows. That more or less means you need to know what is going to be on the other side of every link when you write the DT. > With a static window configuration in DT, you'd end up with a system > that worked much like any x86 system or Tegra, with some static memory > range available to PCIe. It's just that in your case, the region > location/size could change from boot to boot based on DT, whereas it's > hard-coded in HW for Tegra and I assume x86 too. How is this better? Now you have a system where you have to customize the DT before you connect a PCI-E device. What if someone uses this chip in a configuration with physical slots? How does that work? What about hotplug? What about a unified kernel? That is *not* like x86 or tegra. IMHO Thomas's direction in his proposed driver ends up working very close to the new tegra driver, and has the sort of dynamic allocation and discovery people expect from PCI-E. Jason 1 - The new tegra driver switches from calling ARM's pci_common_init once for every physical link, to once for the SOC. It does this by fixing the routing of config transactions so that the kernel sees the per-link PCI-PCI root port bridge config space provided by the hardware at the correct place. By doing this it changes from statically allocating a physical memory region for each link to statically allocating a region for all the links, and dynamically dividing that region amongst the links. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html