Hi Bjorn, many thanks for coming back on this > -----Original Message----- > From: Bjorn Helgaas [mailto:helgaas@xxxxxxxxxx] > Sent: 14 March 2016 19:17 > To: Gabriele Paoloni > Cc: Lorenzo Pieralisi; 'Mark Rutland'; Guohanjun (Hanjun Guo); Wangzhou > (B); liudongdong (C); Linuxarm; qiujiang; 'bhelgaas@xxxxxxxxxx'; > 'arnd@xxxxxxxx'; 'tn@xxxxxxxxxxxx'; 'linux-pci@xxxxxxxxxxxxxxx'; > 'linux-kernel@xxxxxxxxxxxxxxx'; xuwei (O); 'linux- > acpi@xxxxxxxxxxxxxxx'; 'jcm@xxxxxxxxxx'; zhangjukuo; Liguozhu > (Kenneth); 'linux-arm-kernel@xxxxxxxxxxxxxxxxxxx' > Subject: Re: [RFC PATCH v3 3/3] PCI/ACPI: hisi: Add ACPI support for > HiSilicon SoCs Host Controllers > > On Wed, Mar 09, 2016 at 07:41:01AM +0000, Gabriele Paoloni wrote: > > Hi Bjorn, Lorenzo > > > > > -----Original Message----- > > > From: Bjorn Helgaas [mailto:helgaas@xxxxxxxxxx] > > > Sent: 02 March 2016 15:51 > > > To: Lorenzo Pieralisi > > > Cc: Gabriele Paoloni; 'Mark Rutland'; Guohanjun (Hanjun Guo); > Wangzhou > > > (B); liudongdong (C); Linuxarm; qiujiang; 'bhelgaas@xxxxxxxxxx'; > > > 'arnd@xxxxxxxx'; 'tn@xxxxxxxxxxxx'; 'linux-pci@xxxxxxxxxxxxxxx'; > > > 'linux-kernel@xxxxxxxxxxxxxxx'; xuwei (O); 'linux- > > > acpi@xxxxxxxxxxxxxxx'; 'jcm@xxxxxxxxxx'; zhangjukuo; Liguozhu > > > (Kenneth); 'linux-arm-kernel@xxxxxxxxxxxxxxxxxxx' > > > Subject: Re: [RFC PATCH v3 3/3] PCI/ACPI: hisi: Add ACPI support > for > > > HiSilicon SoCs Host Controllers > > > > > > On Tue, Mar 01, 2016 at 07:22:47PM +0000, Lorenzo Pieralisi wrote: > > > > Hi Bjorn, > > > > > > > > On Thu, Feb 25, 2016 at 01:59:12PM -0600, Bjorn Helgaas wrote: > > > > > On Thu, Feb 25, 2016 at 12:07:50PM +0000, Lorenzo Pieralisi > wrote: > > > > > > On Thu, Feb 25, 2016 at 03:01:19AM +0000, Gabriele Paoloni > wrote: > > > > > > > > [...] > > > > > > > > > > I do not understand how PNP0c02 works, currently, by the way. > > > > > > > > > > > > If I read x86 code correctly, the unassigned PCI bus > resources > > > are > > > > > > assigned in arch/x86/pci/i386.c (?) > > > fs_initcall(pcibios_assign_resources), > > > > > > with a comment: > > > > > > > > > > > > /** > > > > > > * called in fs_initcall (one below subsys_initcall), > > > > > > * give a chance for motherboard reserve resources > > > > > > */ > > > > > > > > > > > > Problem is, motherboard resources are requested through (?): > > > > > > > > > > > > drivers/pnp/system.c > > > > > > > > > > > > which is also initialized at fs_initcall, so it might be > called > > > after > > > > > > core x86 code reassign resources, defeating the purpose > PNP0c02 > > > was > > > > > > designed for, namely, request motherboard regions before > > > resources > > > > > > are assigned, am I wrong ? > > > > > > > > > > I think you're right. This is a long-standing screwup in > Linux. > > > > > IMHO, ACPI resources should be parsed and reserved by the ACPI > > > core, > > > > > before any PCI resource management (since PCI host bridges are > > > > > represented in ACPI). But historically PCI devices have > enumerated > > > > > before ACPI got involved. And the ACPI core doesn't really pay > > > > > attention to _CRS for most devices (with the exception of > PNP0C02). > > > > > > > > > > IMO the PNP0C02 code in drivers/pnp/system.c should really be > done > > > in > > > > > the ACPI core for all ACPI devices, similar to the way the PCI > core > > > > > reserves BAR space for all PCI devices, even if we don't have > > > drivers > > > > > for them. I've tried to fix this in the past, but it is really > a > > > > > nightmare to unravel everything. > > > > > > > > > > Because the ACPI core doesn't reserve resources for the _CRS of > all > > > > > ACPI devices, we're already vulnerable to the problem of > placing a > > > > > device on top of another ACPI device. We don't see problems > > > because > > > > > on x86, at least, most ACPI devices are already configured by > the > > > BIOS > > > > > to be enabled and non-overlapping. But x86 has the advantage > of > > > > > having extensive test coverage courtesy of Windows, and as long > as > > > > > _CRS has the right stuff in it, we at least have the potential > of > > > > > fixing problems in Linux. > > > > > > > > ... > > > > By "fixing problems in Linux" above, you mean that, given that we > > > > do have a validated _CRS space, we can request/reserve the region > the > > > _CRS > > > > reports to prevent assigning those resources to other devices, > > > correct ? > > > > > > Yes. > > > > > > I think part of what makes this difficult in Linux is that the > > > resource tree is too strict about overlapping resources. We get > > > address space information from E820 (on x86), static ACPI tables > like > > > MCFG, and dynamic things like ACPI _CRS. There's no real > requirement > > > that the BIOS should make all these consistent, but yet we try to > jam > > > it all into the same resource tree. > > > > > > For example, E820 might tell us that range A is reserved and > > > unavailable to Linux. We stick it in the resource tree. Then we > > > might have a _CRS that tells us about range B. We *want* to put > range > > > B in the resource tree, but if B overlaps part of A, the insert > will > > > fail. > > > > > > All we really need from E820 is the information that "you can't put > > > devices in A". We don't need to enforce any relationship between A > > > and B, but the current resource tree imposes unnecessary > hierarchical > > > requirements. > > > > > > I think issues like this are the biggest reason why the ACPI core > > > doesn't reserve all _CRS space early on (Rafael may correct me > here). > > > > > > > > If the platform doesn't report resource usage correctly on ARM, > we > > > may > > > > > not find problems (because we don't have the Windows test > suite) > > > and > > > > > if we have resource assignment problems because _CRS is > lacking, > > > we'll > > > > > have no way to fix them. > > > > > > > > And I think here you mean we can't prevent assigning resource > space > > > to > > > > devices that do not necessarily own it because since some devices > > > _CRS > > > > are borked/missing we have no way to detect the address space > > > allocated > > > > to them and we may end up with resources conflicts. > > > > > > The ACPI core currently doesn't reserve the space consumed by ACPI > > > devices. Some drivers, e.g., for PNP0C02 (motherboard) and PNP0A03 > > > (PCI host bridge), do reserve their space, but the core itself does > > > not. > > > > > > If we have drivers for all the ACPI devices, those drivers will > > > probably use _CRS and reserve the space, and we'll probably notice > any > > > _CRS errors. But if we don't have drivers, e.g., for performance > > > monitors or other non-essential things, nothing will use _CRS, and > > > nothing will reserve the space used by the device, and it's hard to > > > find errors. > > > > > > If we ever assign top-level resources, there's nothing to prevent > us > > > from creating a conflict. The only reason we don't trip over this > is > > > that we usually don't assign top-level resources because firmware > does > > > it for us. > > > > It seems that in this thread we have touched quite few issues. > > > > First is how to describe a PCI host controller config space resource: > > as you highlighted before mentioning the specs, these resources > should > > be marked as "PNP0C02"; therefore I guess the current Nowicki > patchset > > must be reworked to check the resources to be motherboard reserved > when > > parsing the MCFG table. > > I think checking whether MCFG resources are reserved by motherboard > ("PNP0C02") devices is a hack that was added on x86 because there were > issues getting ECAM to work reliably. The theory at the time was that > the problem was BIOS bugs. I don't know whether that's actually true > or not. > > I'm not convinced that checking for PNP0C02 resources is something we > should do in generic code. MCFG is a static table, and I don't think > we should add dependencies on the ACPI namespace, because the point of > static tables is to describe things we might need before the namespace > is available. Ok fine, then the current Nowicki patchset is good from this perspective > > > Also with respect to the ACPI table for my specific PCIe controller I > > would use the following approach: > > > > // PCIe Root bus > > Device (PCI1) > > { > > Name (_HID, "HISI0080") // PCI Express Root Bridge > > Name (_CID, "PNP0A03") // Compatible PCI Root Bridge > > Name(_SEG, 1) // Segment of this Root complex > > Name(_BBN, 0) // Base Bus Number > > Name(_CCA, 1) > > Method (_CRS, 0, Serialized) { // Root complex resources > > Name (RBUF, ResourceTemplate () { > > WordBusNumber ( // Bus numbers assigned to this > root > > ResourceProducer, MinFixed, MaxFixed, > PosDecode, > > 0, // AddressGranularity > > 0, // AddressMinimum - Minimum Bus Number > > 63, // AddressMaximum - Maximum Bus > Number > > 0, // AddressTranslation - Set to 0 > > 64 // RangeLength - Number of Busses > > ) > > QWordMemory ( // 64-bit BAR Windows > > ResourceProducer, > > PosDecode, > > MinFixed, > > MaxFixed, > > Cacheable, > > ReadWrite, > > 0x0000000000000000, // Granularity > > 0x00000000b0000000, // Min Base Address > pci address > > 0x00000000bbfeffff, // Max Base Address > > 0x0000021f54000000, // Translate > > 0x000000000bff0000 // Length > > ) > > QWordIO ( > > ResourceProducer, > > MinFixed, > > MaxFixed, > > PosDecode, > > EntireRange, > > 0x0000000000000000, // Granularity > > 0x0000000000000000, // Min Base Address > > 0x000000000000ffff, // Max Base Address > > 0x000002200fff0000, // Translate > > 0x0000000000010000 // Length > > ) > > }) // Name(RBUF) > > Return (RBUF) > > } // Method(_CRS) > > Device (RES0) > > { > > Name (_HID, "HISI0081") // HiSi PCIe RC config base > address > > Name (_CID, "PNP0C02") // Motherboard reserved > resource > > Name (_CRS, ResourceTemplate (){ > > Memory32Fixed (ReadWrite, 0xb0080000 , > 0x10000) > > }) > > > > } > > > > So in the table above I have a sub-device under the RC to pass the > address > > for the RC config space (the rest of the config space addresses for > bus 1 > > to 63 are passed in the MCFG). As you can see the device _CID is > PNP0C02 > > As per ACPI specs. > > Do you see anything wrong with this approach? > > It looks OK to me. The PCI Firmware spec r3.0, sec 4.1.2, does say > that the motherboard resource would usually appear at the root of the > namespace (under \_SB). That's not an absolute statement, but I don't > know why it would be included at all unless the authors thought it was > important for some reason. Ok great so I'll start reworking my ACPI based quirk according to the table above Many Thanks Gab > > > The second issue is when and how to reserve HW resources. As per > > conversation above this seems quite a tricky issue and probably needs > to > > consider different aspects... > > I don't think resources should be reserved based on MCFG. Maybe we > need to reserve MCFG areas on x86 for legacy reasons, but I don't > think we should do it on arm64. > > > I was wondering if we can take a gradual approach; maybe for the time > being > > we can just rework Nowicki patchset to check the MCFG resources to be > > motherboard reserved and later on we can make an effort to fix the > resource > > insertion mechanism making sure that it works right on both x86 and > ARM. > > > > What do you think about? > > > > Many Thanks > > > > Gab > > > > > > > > > > > As per last Tomasz's patchset, we claim and assign unassigned > PCI > > > > > > resources upon ACPI PCI host bridge probing (which happens at > > > > > > subsys_initcall time, courtesy of ACPI current code); at that > > > time the > > > > > > kernel did not even register the PNP0c02 driver > > > (drivers/pnp/system.c) > > > > > > (it does that at fs_initcall). On the other hand, we insert > MCFG > > > > > > regions into the resource tree upon MCFG parsing, so I do not > > > > > > see why we need to rely on PNP0c02 to do that for us > (granted, > > > the > > > > > > mechanism is part of the PCI fw specs, which are x86 centric > > > anyway > > > > > > ie we can't certainly rely on Int15 e820 to detect reserved > > > memory > > > > > > on ARM :D) > > > > > > > > > > > > There is lots of legacy x86 here and Bjorn definitely has > more > > > > > > visibility into that than I have, the ARM world must > understand > > > > > > how this works to make sure we have an agreement. > > > > > > > > > > As you say, there is lots of unpleasant x86 legacy here. > Possibly > > > ARM > > > > > has a chance to clean this up and do it more sanely; I'm not > sure > > > > > whether it's feasible to reverse the ACPI/PCI init order there > or > > > not. > > > > > > > > > > Rafael, any thoughts on this whole thing? > > > > > > > > > > Bjorn > > > > > > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe linux- > acpi" > > > in > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > > More majordomo info at http://vger.kernel.org/majordomo- > info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-acpi" > in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html