[+cc Andi, linux-kernel, x86 for Intel Broadwell-EP non-BAR issue] On Tue, Feb 23, 2016 at 08:10:03PM +0530, Jayachandran Chandrashekaran Nair wrote: > On Thu, Feb 18, 2016 at 9:19 PM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Wed, Feb 17, 2016 at 10:36:55PM +0530, Jayachandran Chandrashekaran Nair wrote: > >> On Wed, Feb 17, 2016 at 2:33 AM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > >> > [+cc Arnd, Rob, DT host bridge description questions] > >> > > >> > On Tue, Feb 16, 2016 at 11:39:41PM +0530, Jayachandran Chandrashekaran Nair wrote: > >> >> On Tue, Feb 16, 2016 at 10:44 PM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > >> >> > On Tue, Feb 16, 2016 at 09:47:34PM +0530, Jayachandran Chandrashekaran Nair wrote: > >> >> >> On Tue, Feb 16, 2016 at 12:00 AM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > >> >> >> > On Mon, Feb 15, 2016 at 02:56:38PM +0530, Jayachandran C wrote: > >> >> >> >> Handle two quirks of the PCI controller on Broadcom's Vulcan processor. > >> >> >> >> - Mark internal bridges so that they are skipped during DMA alias > >> >> >> >> search. > >> >> >> >> - Skip BAR0 resource assignment for internal bridges. The BARs > >> >> >> >> of internal bridges should not be assigned from the mem resource > >> >> >> >> range. > >> >> >> >> ... > >> >> >> >> drivers/pci/quirks.c | 21 +++++++++++++++++++++ > >> >> >> >> 1 file changed, 21 insertions(+) > >> >> >> >> > >> >> >> >> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > >> >> >> >> index 0575a1e..afc186a 100644 > >> >> >> >> --- a/drivers/pci/quirks.c > >> >> >> >> +++ b/drivers/pci/quirks.c > >> >> >> >> @@ -3705,6 +3705,27 @@ DECLARE_PCI_FIXUP_HEADER(0x1283, 0x8892, quirk_use_pcie_bridge_dma_alias); > >> >> >> >> DECLARE_PCI_FIXUP_HEADER(0x8086, 0x244e, quirk_use_pcie_bridge_dma_alias); > >> >> >> >> > >> >> >> >> /* > >> >> >> >> + * Two levels of bridges in Broadcom Vulcan are not real PCI or PCIe bridges. > >> >> >> >> + * These are internal bridges and should not be used for dma alias > >> >> >> >> + * calculations. Additionally, the BAR0 of thes bridges should not be > >> >> >> >> + * assigned with a mem resource from linux > >> >> >> >> + */ > >> >> >> >> +static void quirk_bridge_brcm_vulcan_internal(struct pci_dev *pdev) > >> >> >> >> +{ > >> >> >> >> + struct resource *r = &pdev->resource[0]; > >> >> >> >> + > >> >> >> >> + /* skip from alias search */ > >> >> >> >> + pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS; > >> >> >> >> + > >> >> >> >> + /* clear BAR0, should not be used from Linux */ > >> >> >> >> + memset(r, 0, sizeof(*r)); > >> >> >> > > >> >> >> > This definitely needs some explanation. The whole point of the > >> >> >> > architected PCI config space header is so that generic OS code can > >> >> >> > manage the device without having to add device-specific code. > >> >> >> > > >> >> >> > Are you saying the register at 0x10 in config space is not actually a > >> >> >> > BAR at all? Or it is a BAR, but you don't think anybody should need > >> >> >> > to use it? > >> >> >> > >> >> >> These is a proper BAR (64 bit pref) in the bridge's Type 1 header, but: > >> >> >> 1. Assigning a memory resource from the pci memory window to this > >> >> >> BAR causes the PCIe system to fail (this is a bug). So we cannot > >> >> >> expose this BAR to standard Linux code without even more quirks > >> >> >> and hacks. > >> >> > > >> >> > Are you saying assigning space for this BAR exposes a Linux PCI bug? > >> >> > If so, I'd like to know more about that because we should fix it. > >> >> > > >> >> > Or does it expose a hardware bug in the Broadcom device? > >> >> > >> >> This is a hardware bug (or non-compliance). The BAR cannot be > >> >> assigned from the PCI MEM range. To use it we need to program > >> >> an address outside the areas assigned to PCI to the BAR. But like > >> >> I said it is better for us to ignore the BAR in linux and then the > >> >> device acts as a normal bridge. > >> > > >> > Your PCI host bridge has a window of address space that it forwards > >> > from the primary (CPU) side to the secondary (PCI) side. I assume > >> > that's what you mean by the "PCI MEM range." Apparently if the BAR is > >> > programmed inside that window, it causes some hardware error. That > >> > still seems strange; there's no driver for the device, and we know the > >> > range is in use so it won't be assigned to any other device, so Linux > >> > should never *access* the range. > >> > >> This is correct. The write to this bridge BAR with a address from > >> the host bridge window will cause a hardware issue. > >> > >> > Apparently if you program the BAR *outside* the window, the hardware > >> > error does not happen, and the private registers are accessible. But > >> > Linux currently doesn't know where this space is. > >> > > >> > If we ignore the BAR in Linux, apparently the hardware error does not > >> > happen. But the BAR still contains some value, so this is really the > >> > same as having the BAR outside the window, presumably because it came > >> > out of reset that way, or maybe firmware programmed it. > >> > >> Yes. > >> > >> > It sounds to me like you should do the following: > >> > > >> > a) Have firmware program this BAR where you want it, > >> > > >> > b) Describe these private registers as internal PCI bridge registers, > >> > using a DT "reg" property, > >> > >> Two ponts here: > >> - We have to support ACPI as well > > > > ACPI can do something similar. This register space would be described > > in a _CRS method. IIRC there is actually a bit in the _CRS descriptor > > that was intended to tell you whether the resource is (a) consumed > > directly by the device or (b) forwarded to downstream devices. But I > > think BIOSes didn't use that bit correctly, so it's useless, so the > > _CRS method might have to be on a different device. > > > >> - There is no need to use the private registers in linux, so the > >> whole thing will be pointless. > >> > >> > c) Describe the host bridge window (which doesn't include the above > >> > registers) using the normal "ranges" property, and > >> > >> This is standard code (and ACPI works as well).. > >> > >> > d) Arrange for config writes to BAR 0 to be dropped and for reads to > >> > return 0, maybe by checks in your config accessors. > >> > >> Here we are hiding the BAR from linux using checks in config read > >> write (if i understand correctly). This will need custom config accessors > >> for Vulcan both on device tree as well as in ACPI. > > > > Config accessors are usually not specific to DT or ACPI. > > > >> The patch (above) is trying to do it in a much much simpler way by > >> clearing the bar0 resource in DECLARE_PCI_FIXUP_HEADER quirk. > >> I am not able to see why this is not valid. > > > > I think there are two problems: > > > > 1) If the device responds to any address space, Linux needs to know > > about it. If we don't know about it, we might assign that space to > > something else. > > If it is really needed I can mark the address space as reserved. And > Linux generally should not use physical address space outside the > areas specified by firmware of device tree. That's true, at least in theory. Historically on x86, we only avoided areas marked as reserved. We *should* have only used areas marked as available for MMIO, but Linux didn't pay much attention to the _CRS methods that tell us that. We just assumed anything that wasn't RAM and wasn't marked as reserved in E820 was fair game for use by PCI. > > 2) The PCI core still reads and writes the BAR to size it, and we > > don't know whether this will trigger the hardware issue. > > The sizing read/write is done after disabling the BAR by writing > to CMD register - I don't see why you think this will be a problem. You're right; I forgot about that CMD enable bit. There are a few cases where we don't disable decoding (see mmio_always_on), the most important being for PCI_CLASS_BRIDGE_HOST devices. But I think your device is a PCI_CLASS_BRIDGE_PCI device, not a BRIDGE_HOST device, so we should disable decoding on your device. > > *You* might know that neither of these is an issue today, but special > > incidental knowledge like that is a maintenance problem because I > > can't tell what PCI core changes might make them an issue in the > > future. > > The worst case maintenance problem is that a future PCI change > might break our chip. This is not unexpected in Linux, and such > breakages can be easily fixed. It's certainly unexpected by *me* :) And the burden of fixing such breakages doesn't scale well -- it usually involves triage by somebody other than the chip folks, e.g., me. Andi is doing something similar for some Broadwell-EP devices that have BARs that aren't really BARs. I suggested custom config accessors, which is fairly clean from the core point of view, but it's definitely clunky on the device side. Andi's first proposals were to use a quirk to set a flag bit that told us not to touch the BAR. I'm starting to think that's a cleaner approach. I think it would be better to put the bit(s) in the pci_dev, so generic struct resource users wouldn't see them, set them via early quirks, then have __pci_read_base() check them and just leave the struct resource zeroed out. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html