On Thu, Feb 20, 2014 at 12:18:42PM -0700, Bjorn Helgaas wrote: > > On Marvell hardware, the physical address space layout is configurable, > > through the use of "MBus windows". A "MBus window" is defined by a base > > address, a size, and a target device. So if the CPU needs to access a > > given device (such as PCIe 0.0 for example), then we need to create a > > "MBus window" whose size and target device match PCIe 0.0. > > I was assuming "PCIe 0.0" was a host bridge, but it sounds like maybe > that's not true. Is it really a PCIe root port? That would mean the > MBus windows are some non-PCIe-compliant thing between the root > complex and the root ports, I guess. It really is a root port. The hardware acts like a root port at the TLP level. It has all the root port specific stuff in some format but critically, completely lacks a compliant config space for a root port bridge. So the driver creates a 'compliant' config space for the root port. Building the config space requires harmonizing registers related to the PCI-E and registers related to internal routing and dealing with the mismatch between what the hardware can actualy provide and what the PCI spec requires it provide. The only mismatch that gets exposed to the PCI core we know about is the bridge window address alignment restrictions. This is what Thomas has been asking about. > > Since Armada XP has 10 PCIe interfaces, we cannot just statically > > create as many MBus windows as there are PCIe interfaces: it would both > > exhaust the number of MBus windows available, and also exhaust the > > physical address space, because we would have to create very large > > windows, just in case the PCIe device plugged behind this interface > > needs large BARs. > > Everybody else in the world *does* statically configure host bridge > apertures before enumerating the devices below the bridge. The original PCI-E driver for this hardware did use a 1 root port per host bridge model, with static host bridge aperture allocation and so forth. It works fine, just like everyone else in the world, as long as you have only 1 or 2 ports. The XP hardware had *10* ports on a single 32 bit machine. You run out of address space, you run out of HW routing resources, it just doesn't work acceptably. > I see why you want to know what devices are there before deciding > whether and how large to make an MBus window. But that is new > functionality that we don't have today, and the general idea is not Well, in general, it isn't new core functionality, it is functionality that already exists to support PCI bridges. Choosing to use a one host bridge to N root port bridge model lets the driver use all that functionality and the only wrinkle that becomes visible to the PCI core as a whole is the non-compliant alignment restriction on the bridge window BAR. This also puts the driver in alignment with the PCI-E specs for root complexes, which means user space can actually see things like the PCI-E root port link capability block and makes it hot plug work properly (I am actively using hot plug with this driver) I personaly think this is a reasonable way to support this highly flexible HW. > I'm still not sure I understand what's going on here. It sounds like > your emulated bridge basically wraps the host bridge and makes it look > like a PCI-PCI bridge. But I assume the host bridge itself is also > visible, and has apertures (I guess these are the MBus windows?) No, there is only one bridge, it is a per-physical-port MBUS / PCI-E bridge. It performs an identical function to the root port bridge described in PCI-E. MBUS serves as the root-complex internal bus 0. There isn't 2 levels of bridging, so the MBUS / PCI-E bridge can claim any system address and there is no such thing as a 'host bridge'. What Linux calls 'the host bridge aperture' is simply a wack of otherwise unused physical address space, it has no special properties. > It'd be nice if dmesg mentioned the host bridge explicitly as we do on > other architectures; maybe that would help understand what's going on > under the covers. Maybe a longer excerpt would already have this; you > already use pci_add_resource_offset(), which is used when creating the > root bus, so you must have some sort of aperture before enumerating. Well, /proc/iomem looks like this: e0000000-efffffff : PCI MEM 0000 e0000000-e00fffff : PCI Bus 0000:01 e0000000-e001ffff : 0000:01:00.0 'PCI MEM 0000' is the 'host bridge aperture' it is an arbitary range of address space that doesn't overlap anything. 'PCI Bus 0000:01' is the MBUS / PCI-E root port bridge for physical port 0 '0000:01:00.0' is BAR 0 of an an off-chip device. > If 01:00.0 is a PCIe endpoint, it must have a root port above it, so > that means 00:01.0 must be the root port. But I think you're saying > that 00:01.0 is actually *emulated* and isn't PCIe-compliant, e.g., it > has extra window alignment restrictions. It is important to understand that the emulation is only of the root port bridge configuration space. The underlying TLP processing is done in HW and is compliant. > I'm scared about what other non-PCIe-compliant things there might > be. What happens when the PCI core configures MPS, ASPM, etc., As the TLP processing and the underlying PHY are all compliant these things are all supported in HW. MPS is supported directly by the HW ASPM is supported by the HW, as is the entire link capability and status block. AER is supported directly by the HW But here is the thing, without the software emulated config space there would be no sane way for the Linux PCI core to access these features. The HW simply does not present them in a way that the core code can understand without a SW intervention of some kind. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html