On 3/9/2013 8:55 PM, Jason Gunthorpe wrote: > On Sat, Mar 09, 2013 at 06:52:13PM -1000, Mitch Bradley wrote: > >> Okay, I think I finally get it. The Marvell root port bridge setup >> registers look like standard config headers, even though they aren't >> really in config space (because you need a different access method for >> them, compared to real config accesses to downstream devices). > > Well, thats pretty close. On Marvell the MMIO space has a lot of > stuff, some of it is config related, lots of it isn't. In fact most of > it isn't. Tegra is different, the MMIO region is exactly the config > space of the root port bridge. > > Tegra could probably happily and sanely do as you've described (well, > see below), but it is wonky for Marvell. It looks like there are more > drivers coming that have this need - so I had hoped for a general > answer to the 'per-port MMIO space' problem that is unrelated to the > MMIO spaces role in config access. > >> So, in order for the common code that enumerates the PCI bus to work, >> you need to "spoof" the config accesses so that when you try to access >> those particular "pseudo config headers", it uses the MMIO method >> instead of the real config access method. > > There are rules for config access in both Tegra and Marvell that are > not trivial. Again, the driver takes are of this and all Linux sees is > a nice compliant config space. Thanks for explaining that. So it seems that we are faced with two requirements that are somewhat at odds with one another. 1) Some of the root port bridge registers have to be accessed via config space access functions so common PCI enumeration code will work. To represent this with the usual DT structure, the top root-complex needs to define a 3/2 address space so its children can have standard PCI reg properties. Presumably, if those registers are being programmed by common code, Marvell-specific code would restrict its role to setting up config-space access functions, leaving the actual touching of the registers to the common code. 2) Marvell chips have additional non-standard per-root-port registers that generic PCI code would not understand. These registers would be touched only by Marvell-specific code. The two kinds of registers are adjacent in MMIO space. However, unless I am misunderstanding this MV78230 manual, the highest "config header" register index is 0x134, well below the 0x1000 size limit of a PCIe config header. Some of the extra registers are at 0x8xx, and others are above 0x1800. That suggests the following: For the "config header" registers that use generic PCI code, use a ranges entry to associate (pass through) MMIO addresses to the config header portion of the register block. This parameterizes the code that sets up the special-case PCI config access. That specification technique is general and could be used not only for the Marvell case, but also for any other chip that has some number of direct-mapped config headers. For requirement (2), the top node has a reg property listing the portions of the address space that are consumed by the driver at the top level instead of being passed through to the PCI addressing domain. E.g. reg = <0xd0040800 0x1800>, <0xd0080800 0x1800>; Thus we have accurately described the two aspects of the true situation. The PCI-compliant "config header" registers are passed through to the child nodes where they can be dealt with in the normal PCI fashion. The non-PCI-compliant register footprint lives within the Marvell-specific root-complex driver. The root-complex driver presumably needs associate non-compliant register blocks with specific child nodes. That can be done by requiring that the reg entries are in the same order as the config-space ranges entries. Anticipating the possible objection that ARM's 0x1000-byte page size does not permit virtual-to-physical mapping at 0x800-byte granularity: The device tree does not guarantee that reg entries are page-aligned; it simply tries to describe the reality, even though it might be messy. > >> If that is indeed the case, them I would vote for a slight modification >> of the intermediate patch that I cited earlier - the one in which the >> ranges property includes translations from those special config >> addresses into CPU addresses. The modification is to fix the sizes, >> changing 0x2000 to 0x800, so those ranges entries do not overlap in the >> child address domain. BTW I have completely changed my mind about the overlap thing. I said that it was bogus to use size=0x2000 for a config space header. That was based on an interpretation - which I now dislike - of the meaning of 3-cell config addresses. By that old interpretation, size=0x800 would also be bogus, because bits 10-8 of the config address are for the function number. Consider the following question, which I have never previously considered, at least not explicitly: Q: What would be the 3-cell representation of the Command/Status register address (offset 4) in device 1, function 1? One obvious - but weak - answer is <0x00000904 0 0> . It's obvious because the value 0x80000904 is what you put in the cf8 indirect access address register. But it is weak because it unnecessarily restricts the config header size, and because it works differently than offsets applied to memory and I/O space addresses. Furthermore, coupling the offset to the cf8 register is dodgy because of the funniness where you have to but the byte-selector bits 1..0 not in the address register cf8, but rather in the data register cfc. The better answer is <0x00000900 0 4>, using the third cell for the offset, as with IO and memory space offsets. Under this better interpretation, config space sizes larger than 0x100 are no problem. The first cell is just a selector. The size is "added" to the third cell, so it does not encroach into the first cell's device,function bits. This better interpretation easily handles PCIe's 0x1000-byte extended config headers, and trivially accomodates even larger sizes should a future PCI variant further increase the header size. as described way down below. > > On Marvell the size of the MMIO space is 0x2000, there are imporant > registers there beyond index 0x800 (todays driver may not touch them > but HW has them). Reducing the size doesn't seem appropriate. Per the above, the current idea is not to reduce the size, but rather to split the 0x2000 into two pieces, accurately reflecting the fact that part of it is PCI-compliant and can be handled by generic PCI drivers, while the rest must be handled differently, outside of standard PCI land. Commingling the two seems dubious. > >> pcie-controller { >> compatible = "marvell,armada-370-pcie"; >> status = "disabled"; > > device_type = "pci"; > > Here as well, as you noted before? > >> #address-cells = <3>; >> #size-cells = <2>; >> >> bus-range = <0x00 0xff>; >> >> ranges = <0x00000800 0 0 0xd4004000 0 0x00000800 /* Root >> port config header */ >> 0x00001000 0 0 0xd4008000 0 0x00000800 /* Root >> port config header */ >> 0x82000000 0 0xe0000000 0xe0000000 0 0x08000000 /* >> non-prefetchable memory */ >> 0x81000000 0 0 0xe8000000 0 0x00100000>; /* >> downstream I/O */ >> >> pcie@1,0 { >> device_type = "pci"; >> reg = <0x0800 0 0 0 0>; > > Okay, this was Thierry's original idea, and it was my note that this > didn't seem like a good choice. I'm not wedded to that, but I'll > explain my reasoning a bit. > > Two things bothered me > - Describing a CPU MMIO mapping with a config address space seems > wonky I agree that mapping config space is sort of a jarring concept, but I think that's because PCs have polluted the mindspace, not because there is anything inherently bad about it. The original PCI spec fundamentally treated config space as just another (rather segmented) linear address space. It specified the cf8/cfc indirect access mechanism not as a deep semantic feature, but rather as an implementation hack to work around addressing limitations of PC chipsets and the addressing-deficient popular OS of the time (Windows 3.1 which was a veneer over DOS). The PCI spec defined configuration mechanism #2 as a direct map into PC I/O space. RISC chipsets of that era (e.g. Alpha, Power PC) often direct-mapped config space into some form of load/store space. But PCs soon encroached on the RISC market and RISC system vendors started to "leverage" PC I/O chipsets. The indirect access "config mechanism #1" became part of the mindset. Indirect access was hardcoded in enough software that chipset designers would provide indirect access even if direct-mapping was possible. Fundamentally, a config header is just a block of registers, addressable as an index relative to some base. It might require some specific access path - but that is true with any block of registers. Even direct-mapped MMIO registers often require specific semantics such as non-cached, non-write-buffered, or special address space ids. The PCI SIG expected PCIe extended config space to be direct-mapped - see slide 25 of http://www.pcisig.com/specifications/pciexpress/technical_library/dev_con_09_02/specifications/pciexpress/PCIExpress_SoftwareandConfigurationModel.pdf So mappability of config space was intended from the get-go. The real wonky thing is the "indirect access zombie that cannot die". > - Linux's OF core doesn't parse a 'reg' property under > 'device_type = "pci"' as something CPU mappable, it only looks > to assigned-addresses for that. So the OF core would need to learn > how to handle this case, and it would have to be well defined.. The way I envisioned it working is that the root-complex node defines its own config space access functions (the "ops" argument to pci_scan_root_bus). For config addresses listed in ranges, the functions do MMIO. For others, they use the chipset's indirect-access registers. The OF core is not involved. > > Thierry's original patch avoided this problem by not using > device_type = "pci".. > > Also, did you mean to have a 0 size for the 'reg'? How will things > know how big the MMIO region is? Can't just assume 0x800.. Sorry, that was a typo. > > Again, thanks! > Jason > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html