Le 30/04/2018 10:46, Gilles BULOZ a écrit : > Le 27/04/2018 18:56, Bjorn Helgaas a écrit : >> On Fri, Apr 27, 2018 at 12:29:32PM +0000, Gilles Buloz wrote: >>> Le 27/04/2018 10:43, Ard Biesheuvel a écrit : >>>> (add Bjorn and linux-pci) >>>> >>>> On 13 April 2018 at 19:32, Gilles Buloz <Gilles.Buloz@xxxxxxxxxxx> wrote: >>>>> Dear developers, >>>>> >>>>> I currently have two functional workarounds for this issue but >>>>> would like to know which one you would recommend, if any :-) I'm >>>>> using a LS1043A CPU (NXP QorIQ Layerscape) and get a "synchronous >>>>> external abort" when booting because of a PCI config read during >>>>> PCI scan. >>>>> >>>>> I'm using a custom hardware (based on LS1043ARDB) having a >>>>> PEX8112 PCIe-to-PCI bridge connected to the LS1043A to have a PCI >>>>> slot for legacy devices. This bridge only supports PCI-Compatible >>>>> config accesses (offset 0x00-0xFF). >> I would guess the PEX8112 itself has 4K of config space, but it only >> forwards 256 bytes of config space to the conventional PCI secondary >> bus. >> >>>>> On this PCI slot I connect a PCI module made of a PCI-to-PCIe >>>>> bridge plus PCIe devices behind. >>>>> >>>>> The problem occurs when the kernel probes the PCIe devices : as >>>>> they are PCIe devices, the kernel does a PCI config read access >>>>> at offset 0x100 to check if "PCIe extended capability registers" >>>>> are accessible (see drivers/pci/probe.c, function >>>>> pci_cfg_space_size_ext()). Unfortunately the PEX8112 PCIe-to-PCI >>>>> bridge that is in the path reports an error to the CPU for this >>>>> access, and it seems there's no way to disable that on this >>>>> bridge. >>>>> >>>>> The first workaround I found was to patch >>>>> drivers/pci/host/pci-layerscape.c to have PCIE_ABSERR_SETTING set >>>>> to 0x9400 instead of 0x9401 (for PCIE_ABSERR register) to disable >>>>> error reporting. This only impacts an NXP part of the Linux >>>>> kernel code, but I'm not sure this is a good idea (however it >>>>> seems to be like that on Intel platforms where even MEM accesses >>>>> to a no-device address return FF without any error). >>>>> >>>>> I've also tried another workaround that works : patch >>>>> drivers/pci/probe.c to use bus_flags to remember if a bus is >>>>> behind a bridge without extended address capability, to avoid PCi >>>>> config read accesses at offset 0x100 in pci_cfg_space_size() / >>>>> pci_cfg_space_size_ext(). But this patch impacts the generic PCI >>>>> probe method of Linux. >>>>> >>>>> Any Idea to properly handle that issue ? >>>>> >>>> This seems like a rather unusual configuration, but I guess that >>>> if the first bridge/switch advertises its inability to support >>>> extended config space accesses, we should not be performing them >>>> on any of its subordinate buses. How does the PEX8112 advertise >>>> this limitation? >>>> >>>> That said, I wonder if it is reasonable in the first place to >>>> expect that a PCIe device works as expected passing through a >>>> legacy PCI layer like that. >>>> >>> The PEX8112 PCIe-to-PCI bridge has capability PCI_CAP_ID_EXP, but >>> has no PCI_CAP_ID_PCIX capability. As I understand the lack of >>> PCI_CAP_ID_PCIX is advertising this limitation on the PCI side (no >>> support for PCI config offset >=0x100). >> Sounds right to me. >> >>> Also I guess in the case of a bridge having PCI_CAP_ID_PCIX, this >>> limitation would be advertised by the lack of PCI_X_STATUS_266MHZ >>> and PCI_X_STATUS_533MHZ (as done in drivers/pci/probe.c at >>> pci_cfg_space_size()) >> Also sounds right. Per the PCI-X spec, checking for PCI_X_STATUS_266MHZ >> should be enough, but it shouldn't hurt to check for either >> PCI_X_STATUS_266MHZ or PCI_X_STATUS_533MHZ. >> >>> I'm currently using the attached patch (for kernel 4.1.35-rt41 from >>> NXP Yocto BSP). It uses bus_flags to remember if a bus is behind a >>> bridge without extended address capability to avoid PCi config >>> accesses at offset >= 0x100. Thanks to this patch I now have a >>> functional system with functional PCI/PCIe devices. >> The patch seems like it's looking at the right things, but I don't >> want to build it into pci_scan_bridge_extend() because that function >> is much too complicated already. >> >> I'd rather build it into pci_cfg_space_size() or >> pci_cfg_space_size_ext() somehow. Maybe something along these lines? >> This doesn't account for the case of a PCIe-to-PCI-X Mode 2 bridge; in >> that case, I think all 4K would be accessible on the PCI-X side. >> >> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >> index ac91b6fd0bcd..d8b091f0bcd1 100644 >> --- a/drivers/pci/probe.c >> +++ b/drivers/pci/probe.c >> @@ -1367,7 +1367,7 @@ static bool pci_ext_cfg_is_aliased(struct pci_dev *dev) >> * pci_cfg_space_size - Get the configuration space size of the PCI device >> * @dev: PCI device >> * >> - * Regular PCI devices have 256 bytes, but PCI-X 2 and PCI Express devices >> + * Regular PCI devices have 256 bytes, but PCI-X Mode 2 and PCI Express devices >> * have 4096 bytes. Even if the device is capable, that doesn't mean we can >> * access it. Maybe we don't have a way to generate extended config space >> * accesses, or the device is behind a reverse Express bridge. So we try >> @@ -1376,9 +1376,14 @@ static bool pci_ext_cfg_is_aliased(struct pci_dev *dev) >> */ >> static int pci_cfg_space_size_ext(struct pci_dev *dev) >> { >> + struct pci_dev *bridge = pci_upstream_bridge(dev); >> u32 status; >> int pos = PCI_CFG_SPACE_SIZE; >> + if (bridge && pci_is_pcie(bridge) && >> + pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE) >> + return PCI_CFG_SPACE_SIZE; >> + >> if (pci_read_config_dword(dev, pos, &status) != PCIBIOS_SUCCESSFUL) >> return PCI_CFG_SPACE_SIZE; >> if (status == 0xffffffff || pci_ext_cfg_is_aliased(dev)) >> >>> --- include/linux/pci.h.orig 2018-03-26 16:51:18.050000000 +0000 >>> +++ include/linux/pci.h 2018-03-26 16:51:27.660000000 +0000 >>> @@ -193,6 +193,7 @@ >>> enum pci_bus_flags { >>> PCI_BUS_FLAGS_NO_MSI = (__force pci_bus_flags_t) 1, >>> PCI_BUS_FLAGS_NO_MMRBC = (__force pci_bus_flags_t) 2, >>> + PCI_BUS_FLAGS_COMPAT_CFG_SPACE = (__force pci_bus_flags_t) 4, >>> }; >>> /* These values come from the PCI Express Spec */ >>> --- drivers/pci/probe.c.orig 2018-01-22 09:29:52.000000000 +0000 >>> +++ drivers/pci/probe.c 2018-03-26 16:54:30.830000000 +0000 >>> @@ -827,6 +827,28 @@ >>> child->primary = primary; >>> pci_bus_insert_busn_res(child, secondary, subordinate); >>> child->bridge_ctl = bctl; >>> + >>> + { >>> + int pos; >>> + u32 status; >>> + bool pci_compat_cfg_space = false; >>> + >>> + if (!pci_is_pcie(dev) || (pci_pcie_type(dev) == PCI_EXP_TYPE_PCIE_BRIDGE) || (pci_pcie_type(dev) == >>> PCI_EXP_TYPE_PCI_BRIDGE)) { >>> + /* for PCI/PCI bridges, or PCIe/PCI bridge in forward or reverse mode, we have to check for PCI-X >>> capabilities */ >>> + pos = pci_find_capability(dev, PCI_CAP_ID_PCIX); >>> + if (pos) { >>> + pci_read_config_dword(dev, pos + PCI_X_STATUS, &status); >>> + if (!(status & (PCI_X_STATUS_266MHZ | PCI_X_STATUS_533MHZ))) >>> + pci_compat_cfg_space = true; >>> + } else { >>> + pci_compat_cfg_space = true; >>> + } >>> + if (pci_compat_cfg_space) { >>> + dev_info(&dev->dev, "[%04x:%04x] Child bus limited to PCI-Compatible config space\n", dev->vendor, >>> dev->device); >>> + child->bus_flags |= PCI_BUS_FLAGS_COMPAT_CFG_SPACE; >>> + } >>> + } >>> + } >>> } >>> cmax = pci_scan_child_bus(child); >>> @@ -1098,6 +1120,11 @@ >>> goto fail; >>> } >>> + if (dev->bus->bus_flags & PCI_BUS_FLAGS_COMPAT_CFG_SPACE) { >>> + dev_info(&dev->dev, "[%04x:%04x] PCI-Compatible config space only due to parent bus(es)\n", dev->vendor, dev->device); >>> + return PCI_CFG_SPACE_SIZE; >>> + } >>> + >>> return pci_cfg_space_size_ext(dev); >>> fail: > Bjorn, > If I'm right about your proposed patch to pci_cfg_space_size_ext(), *bridge is pointing to the upper device of device *dev being > checked. I understand the purpose, but I think this fails for my config that is : > > LS1043 PCIe root -> PEX8112 PCIe-to-PCI bridge -> PMC slot connector -> PCI-to-PCIe bridge -> PCIe switch (4 ports) -> 4 PCIe > devices (one on each port) > > because : > - when pci_cfg_space_size_ext() is run on the 4 PCIe devices, *bridge is the PCIe switch which is not matching > PCI_EXP_TYPE_PCI_BRIDGE. In this case *bridge should also be checked for the parent bus of the PCIe switch, and so on. > - when pci_cfg_space_size_ext() is run for the PCI-to-PCIe bridge, *bridge is the PEX8112 that is also not matching > PCI_EXP_TYPE_PCI_BRIDGE but PCI_EXP_TYPE_PCIE_BRIDGE. This leads to a config access at offset 0x100 to the PCI-to-PCIe bridge, so > a crash (because of the PEX8112) > > I think setting a bit in bus_flags when creating a child bus is very efficient because once set it is automatically inherited by > all child buses and then the only thing that pci_cfg_space_size() has to do for each device is to check for this bit. Also this > PCI_BUS_FLAGS_COMPAT_CFG_SPACE flag is actually a bus property that is compliant with the purpose of bus_flags. > > I agree that pci_scan_bridge_extend() is already too complicated, so would you be okay to only add one line to it : > pci_bus_set_compat_cfg_space(child); > and put all the code I suggested in this new function pci_bus_set_compat_cfg_space() ? (also supporting PCI-X Mode 2 devices) > Improvement : this function can return immediately if the child bus has already inherited the flag from its parent. > I mean something like the attached patch I tested this morning... Sorry, this is for old kernel 4.1.35 but just to clarify what I propose (also applies to 4.16.6 by changing value of PCI_BUS_FLAGS_COMPAT_CFG_SPACE in pci.h to 8).
--- include/linux/pci.h.orig 2018-03-26 16:51:18.050000000 +0000 +++ include/linux/pci.h 2018-04-30 09:50:57.660000000 +0000 @@ -193,6 +193,7 @@ enum pci_bus_flags { PCI_BUS_FLAGS_NO_MSI = (__force pci_bus_flags_t) 1, PCI_BUS_FLAGS_NO_MMRBC = (__force pci_bus_flags_t) 2, + PCI_BUS_FLAGS_COMPAT_CFG_SPACE = (__force pci_bus_flags_t) 4, }; /* These values come from the PCI Express Spec */ --- drivers/pci/probe.c.orig 2018-01-22 09:29:52.000000000 +0000 +++ drivers/pci/probe.c 2018-04-30 13:29:50.600000000 +0000 @@ -754,6 +754,35 @@ PCI_EXP_RTCTL_CRSSVE); } +static void pci_bus_check_compat_cfg_space(struct pci_bus *bus) +{ + struct pci_dev *dev = bus->self; + bool pci_compat_cfg_space = false; + int pos; + u32 status; + + if (bus->bus_flags & PCI_BUS_FLAGS_COMPAT_CFG_SPACE) + return; + + if (!pci_is_pcie(dev) || /* PCI/PCI bridge */ + (pci_pcie_type(dev) == PCI_EXP_TYPE_PCIE_BRIDGE) || /* PCIe/PCI bridge in forward mode */ + (pci_pcie_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE)) { /* PCIe/PCI bridge in reverse mode */ + pos = pci_find_capability(dev, PCI_CAP_ID_PCIX); + if (pos) { + pci_read_config_dword(dev, pos + PCI_X_STATUS, &status); + if (!(status & (PCI_X_STATUS_266MHZ | PCI_X_STATUS_533MHZ))) + pci_compat_cfg_space = true; + } else { + pci_compat_cfg_space = true; + } + if (pci_compat_cfg_space) { + dev_info(&dev->dev, "bus %02x limited to PCI-Compatible config space\n", + bus->number); + bus->bus_flags |= PCI_BUS_FLAGS_COMPAT_CFG_SPACE; + } + } +} + /* * If it's a bridge, configure it and scan the bus behind it. * For CardBus bridges, we don't scan behind as the devices will @@ -827,6 +856,7 @@ child->primary = primary; pci_bus_insert_busn_res(child, secondary, subordinate); child->bridge_ctl = bctl; + pci_bus_check_compat_cfg_space(child); } cmax = pci_scan_child_bus(child); @@ -1084,6 +1114,9 @@ u32 status; u16 class; + if (dev->bus->bus_flags & PCI_BUS_FLAGS_COMPAT_CFG_SPACE) + return PCI_CFG_SPACE_SIZE; + class = dev->class >> 8; if (class == PCI_CLASS_BRIDGE_HOST) return pci_cfg_space_size_ext(dev);