Re: PCI hotplug problems: how to debug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 16, 2009 at 10:51 AM, Ira W. Snyder <iws@xxxxxxxxxxxxxxxx> wrote:
> On Mon, Nov 16, 2009 at 11:32:43AM -0600, Bjorn Helgaas wrote:
>> On Monday 16 November 2009 09:48:04 am Ira W. Snyder wrote:
>> > On Fri, Nov 13, 2009 at 06:26:52PM -0600, Bjorn Helgaas wrote:
>>
>> > > If you post what you've got so far and the dmesg log, I can try to
>> > > help debug it.
>> > >
>> >
>> > Here it is, both patch and dmesg inlined. The dmesg log just keeps going
>> > forever after the "--- [ cut here ] ---" line.
>> >
>> > I've do not have a board plugged in behind the bridge at the moment. The
>> > only devices are the onboard ethernet, vga, etc. That's why the second
>> > host bridge has no memory behind it.
>>
>> The preemption imbalance concerns me.  If this patch causes that,
>> I'm worried that directly updating bus->resource[n] is corrupting
>> something.
>>
>
> Yes, I wonder if that is the case. The pointers are not NULL, but are
> they valid? How would I know?
>
>> Why didn't we find an I/O port window?  The BIOS configured I/O ports
>> for devices, so there must be a window.  You found earlier that the
>> I/O port aperture seemed to be described at 0xd0, so I'm curious why
>> we didn't find it this time.
>>
>
> I'm not sure, I added some extra debugging code to the patch, and it
> seems that resources don't work for IO ports. See the relevant output
> below. Things still crashed in exactly the same way.
>
>> There are two host bridges here, but the second has no devices below it.
>> I suppose it's possible there's some sort of "transparent" mode where
>> the first bridge forwards anything it sees, so the window description
>> at 0xd0 doesn't really matter.  That would mean we couldn't support hot-
>> adding devices under the second bridge unless we know how to program
>> a real aperture in the first bridge and turn off that transparent mode.
>> But this is just speculation.
>>
>
> I unplugged all devices from my crate, which is why there is nothing
> behind the second bridge. I can get a test device if necessary. Until
> the kernel doesn't crash, this seems unnecessary.
>
>> The two host bridges should have different buses below them.  Both of
>> your bridges want to update resources for the bus at 0xf7883800, so
>> that's not going to work.  If you arrange to have x86_pci_root_bus_res_quirks()
>> called, do you see it called twice, with a different pci_bus each time?
>>
>
> Should I add this call at the end of my quirk function? I'm very
> unfamiliar with the PCI subsystem internals.
>
> Here is the revised output from the revised patch below. Note that even
> though I fill in the IO resource with the IO ports, it gets printed as
> NULL. Why would that be?
>
> [    0.304015] pci 0000:00:00.0: calling cnb20le_res+0x0/0x371
> [    0.308014] pci 0000:00:00.0: CNB20LE: busses: 0 to 0
> [    0.312014] pci 0000:00:00.0: CNB20LE: noPF 0xfc20 0xfeaf
> [    0.316013] pci 0000:00:00.0: CNB20LE: noPF [mem 0xfc200000-0xfeafffff]
> [    0.320014] pci 0000:00:00.0: CNB20LE: PF 0xfc00 0xfc0f
> [    0.324012] pci 0000:00:00.0: CNB20LE: PF [mem 0xfc000000-0xfc0fffff pref]
> [    0.328014] pci 0000:00:00.0: CNB20LE: IO 0xd000 0xdffc
> [    0.332010] pci 0000:00:00.0: CNB20LE: IO (null)
> [    0.336011] pci 0000:00:00.0: CNB20LE: parent bus: f7885800 number 0 pri 0 sec 0
> [    0.340011] pci 0000:00:00.0: CNB20LE: parent res0: [mem 0xfc200000-0xfeafffff]
> [    0.344011] pci 0000:00:00.0: CNB20LE: parent res1: [mem 0xfc000000-0xfc0fffff pref]
> [    0.348010] pci 0000:00:00.0: CNB20LE: parent res2: (null)
> [    0.352010] pci 0000:00:00.0: CNB20LE: parent res3: (null)
> [    0.356009] pci 0000:00:00.0: CNB20LE: parent res4: (null)
> [    0.360009] pci 0000:00:00.0: CNB20LE: parent res5: (null)
> [    0.364074] pci 0000:00:00.0: calling quirk_resource_alignment+0x0/0x164
> [    0.368090] pci 0000:00:00.1: found [1166:0009] class 000600 header type 00
> [    0.372014] pci 0000:00:00.1: calling quirk_no_ata_d3+0x0/0x24
> [    0.376013] pci 0000:00:00.1: calling acpi_pm_check_graylist+0x0/0x2d
> [    0.380009] * The chipset may have PM-Timer Bug. Due to workarounds for a bug,
> [    0.380014] * this clock source is slow. If you are sure your timer does not have
> [    0.380018] * this bug, please use "acpi_pm_good" to disable the workaround
> [    0.384013] pci 0000:00:00.1: calling cnb20le_res+0x0/0x371
> [    0.388014] pci 0000:00:00.1: CNB20LE: busses: 1 to 1
> [    0.392014] pci 0000:00:00.1: CNB20LE: noPF 0xfeb0 0xfebf
> [    0.396010] pci 0000:00:00.1: CNB20LE: noPF (null)
> [    0.400014] pci 0000:00:00.1: CNB20LE: PF 0xfc10 0xfc1f
> [    0.404009] pci 0000:00:00.1: CNB20LE: PF (null)
> [    0.408014] pci 0000:00:00.1: CNB20LE: IO 0xe000 0xeffc
> [    0.412009] pci 0000:00:00.1: CNB20LE: IO (null)
> [    0.416011] pci 0000:00:00.1: CNB20LE: parent bus: f7885800 number 0 pri 0 sec 0
> [    0.420011] pci 0000:00:00.1: CNB20LE: parent res0: [mem 0xfc200000-0xfeafffff]
> [    0.424011] pci 0000:00:00.1: CNB20LE: parent res1: [mem 0xfc000000-0xfc0fffff pref]
> [    0.428010] pci 0000:00:00.1: CNB20LE: parent res2: (null)
> [    0.432010] pci 0000:00:00.1: CNB20LE: parent res3: (null)
> [    0.436009] pci 0000:00:00.1: CNB20LE: parent res4: (null)
> [    0.440009] pci 0000:00:00.1: CNB20LE: parent res5: (null)
>
> And the revised patch, with debugging output added:
>
> From 8fbbf7f99e58a1c95580783a1ba7dd6ebdd45187 Mon Sep 17 00:00:00 2001
> From: Ira W. Snyder <iws@xxxxxxxxxxxxxxxx>
> Date: Mon, 16 Nov 2009 08:42:39 -0800
> Subject: [PATCH] PCI: read memory ranges out of Broadcom CNB20LE host bridge
>
> Read the memory ranges behind the Broadcom CNB20LE host bridge out of the
> hardware. This allows PCI hotplugging to work, since we know which memory
> range to allocate PCI BAR's from.
>
> Signed-off-by: Ira W. Snyder <iws@xxxxxxxxxxxxxxxx>
> ---
>  arch/x86/pci/Makefile       |    1 +
>  arch/x86/pci/broadcom_bus.c |   75 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 76 insertions(+), 0 deletions(-)
>  create mode 100644 arch/x86/pci/broadcom_bus.c
>
> diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile
> index d8a0a62..f762c05 100644
> --- a/arch/x86/pci/Makefile
> +++ b/arch/x86/pci/Makefile
> @@ -15,6 +15,7 @@ obj-$(CONFIG_X86_NUMAQ)               += numaq_32.o
>
>  obj-y                          += common.o early.o
>  obj-y                          += amd_bus.o
> +obj-y                          += broadcom_bus.o
>  obj-$(CONFIG_X86_64)           += intel_bus.o
>
>  ifeq ($(CONFIG_PCI_DEBUG),y)
> diff --git a/arch/x86/pci/broadcom_bus.c b/arch/x86/pci/broadcom_bus.c
> new file mode 100644
> index 0000000..5d56e23
> --- /dev/null
> +++ b/arch/x86/pci/broadcom_bus.c
> @@ -0,0 +1,75 @@
> +/*
> + * Read address ranges from a Broadcom CNB20LE Host Bridge
> + *
> + * Copyright (c) 2009 Ira W. Snyder <iws@xxxxxxxxxxxxxxxx>
> + *
> + * This file is licensed under the terms of the GNU General Public License
> + * version 2. This program is licensed "as is" without any warranty of any
> + * kind, whether express or implied.
> + */
> +
> +#define DEBUG 1
> +
> +#include <linux/delay.h>
> +#include <linux/dmi.h>
> +#include <linux/pci.h>
> +#include <linux/init.h>
> +#include <asm/pci_x86.h>
> +
> +#include "bus_numa.h"
> +
> +static int res_num = 0;

why?

YH

> +
> +static void __devinit cnb20le_res(struct pci_dev *dev)
> +{
> +       struct pci_bus *bus = dev->bus;
> +       u16 word1, word2;
> +       u8 fbus, lbus;
> +
> +       pci_read_config_byte(dev, 0x44, &fbus);
> +       pci_read_config_byte(dev, 0x45, &lbus);
> +       dev_dbg(&dev->dev, "CNB20LE: busses: %d to %d\n", fbus, lbus);
> +
> +       pci_read_config_word(dev, 0xc0, &word1);
> +       pci_read_config_word(dev, 0xc2, &word2);
> +       dev_dbg(&dev->dev, "CNB20LE: noPF 0x%.4x 0x%.4x\n", word1, word2);
> +       if (word1 != word2) {
> +               bus->resource[res_num]->start = (word1 << 16) | 0x0000;
> +               bus->resource[res_num]->end = (word2 << 16) | 0xffff;
> +               bus->resource[res_num]->flags = IORESOURCE_MEM;
> +               dev_dbg(&dev->dev, "CNB20LE: noPF %pR\n", bus->resource[res_num]);
> +               res_num++;
> +       }
> +
> +       pci_read_config_word(dev, 0xc4, &word1);
> +       pci_read_config_word(dev, 0xc6, &word2);
> +       dev_dbg(&dev->dev, "CNB20LE: PF 0x%.4x 0x%.4x\n", word1, word2);
> +       if (word1 != word2) {
> +               bus->resource[res_num]->start = (word1 << 16) | 0x0000;
> +               bus->resource[res_num]->end = (word2 << 16) | 0xffff;
> +               bus->resource[res_num]->flags = IORESOURCE_MEM | IORESOURCE_PREFETCH;
> +               dev_dbg(&dev->dev, "CNB20LE: PF %pR\n", bus->resource[res_num]);
> +               res_num++;
> +       }
> +
> +       pci_read_config_word(dev, 0xd0, &word1);
> +       pci_read_config_word(dev, 0xd2, &word2);
> +       dev_dbg(&dev->dev, "CNB20LE: IO 0x%.4x 0x%.4x\n", word1, word2);
> +       if (word1 != word2) {
> +               bus->resource[res_num]->start = word1;
> +               bus->resource[res_num]->end = word2;
> +               bus->resource[res_num]->flags = IORESOURCE_IO;
> +               dev_dbg(&dev->dev, "CNB20LE: IO %pR\n", bus->resource[res_num]);
> +               res_num++;
> +       }
> +
> +       dev_dbg(&dev->dev, "CNB20LE: parent bus: %p number %d pri %d sec %d\n", bus, bus->number, bus->primary, bus->secondary);
> +       dev_dbg(&dev->dev, "CNB20LE: parent res0: %pR\n", dev->bus->resource[0]);
> +       dev_dbg(&dev->dev, "CNB20LE: parent res1: %pR\n", dev->bus->resource[1]);
> +       dev_dbg(&dev->dev, "CNB20LE: parent res2: %pR\n", dev->bus->resource[2]);
> +       dev_dbg(&dev->dev, "CNB20LE: parent res3: %pR\n", dev->bus->resource[3]);
> +       dev_dbg(&dev->dev, "CNB20LE: parent res4: %pR\n", dev->bus->resource[4]);
> +       dev_dbg(&dev->dev, "CNB20LE: parent res5: %pR\n", dev->bus->resource[5]);
> +}
> +
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, PCI_DEVICE_ID_SERVERWORKS_LE, cnb20le_res);
> --
> 1.5.4.3
>
> Thanks, Ira
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux