Re: PCI hotplug problems: how to debug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 16, 2009 at 11:20 AM, Ira W. Snyder <iws@xxxxxxxxxxxxxxxx> wrote:
> On Mon, Nov 16, 2009 at 11:03:17AM -0800, Yinghai Lu wrote:
>> On Mon, Nov 16, 2009 at 10:51 AM, Ira W. Snyder <iws@xxxxxxxxxxxxxxxx> wrote:
>> > On Mon, Nov 16, 2009 at 11:32:43AM -0600, Bjorn Helgaas wrote:
>> >> On Monday 16 November 2009 09:48:04 am Ira W. Snyder wrote:
>> >> > On Fri, Nov 13, 2009 at 06:26:52PM -0600, Bjorn Helgaas wrote:
>> >>
>> >> > > If you post what you've got so far and the dmesg log, I can try to
>> >> > > help debug it.
>> >> > >
>> >> >
>> >> > Here it is, both patch and dmesg inlined. The dmesg log just keeps going
>> >> > forever after the "--- [ cut here ] ---" line.
>> >> >
>> >> > I've do not have a board plugged in behind the bridge at the moment. The
>> >> > only devices are the onboard ethernet, vga, etc. That's why the second
>> >> > host bridge has no memory behind it.
>> >>
>> >> The preemption imbalance concerns me.  If this patch causes that,
>> >> I'm worried that directly updating bus->resource[n] is corrupting
>> >> something.
>> >>
>> >
>> > Yes, I wonder if that is the case. The pointers are not NULL, but are
>> > they valid? How would I know?
>> >
>> >> Why didn't we find an I/O port window?  The BIOS configured I/O ports
>> >> for devices, so there must be a window.  You found earlier that the
>> >> I/O port aperture seemed to be described at 0xd0, so I'm curious why
>> >> we didn't find it this time.
>> >>
>> >
>> > I'm not sure, I added some extra debugging code to the patch, and it
>> > seems that resources don't work for IO ports. See the relevant output
>> > below. Things still crashed in exactly the same way.
>> >
>> >> There are two host bridges here, but the second has no devices below it.
>> >> I suppose it's possible there's some sort of "transparent" mode where
>> >> the first bridge forwards anything it sees, so the window description
>> >> at 0xd0 doesn't really matter.  That would mean we couldn't support hot-
>> >> adding devices under the second bridge unless we know how to program
>> >> a real aperture in the first bridge and turn off that transparent mode.
>> >> But this is just speculation.
>> >>
>> >
>> > I unplugged all devices from my crate, which is why there is nothing
>> > behind the second bridge. I can get a test device if necessary. Until
>> > the kernel doesn't crash, this seems unnecessary.
>> >
>> >> The two host bridges should have different buses below them.  Both of
>> >> your bridges want to update resources for the bus at 0xf7883800, so
>> >> that's not going to work.  If you arrange to have x86_pci_root_bus_res_quirks()
>> >> called, do you see it called twice, with a different pci_bus each time?
>> >>
>> >
>> > Should I add this call at the end of my quirk function? I'm very
>> > unfamiliar with the PCI subsystem internals.
>> >
>> > Here is the revised output from the revised patch below. Note that even
>> > though I fill in the IO resource with the IO ports, it gets printed as
>> > NULL. Why would that be?
>> >
>> > [    0.304015] pci 0000:00:00.0: calling cnb20le_res+0x0/0x371
>> > [    0.308014] pci 0000:00:00.0: CNB20LE: busses: 0 to 0
>> > [    0.312014] pci 0000:00:00.0: CNB20LE: noPF 0xfc20 0xfeaf
>> > [    0.316013] pci 0000:00:00.0: CNB20LE: noPF [mem 0xfc200000-0xfeafffff]
>> > [    0.320014] pci 0000:00:00.0: CNB20LE: PF 0xfc00 0xfc0f
>> > [    0.324012] pci 0000:00:00.0: CNB20LE: PF [mem 0xfc000000-0xfc0fffff pref]
>> > [    0.328014] pci 0000:00:00.0: CNB20LE: IO 0xd000 0xdffc
>> > [    0.332010] pci 0000:00:00.0: CNB20LE: IO (null)
>> > [    0.336011] pci 0000:00:00.0: CNB20LE: parent bus: f7885800 number 0 pri 0 sec 0
>> > [    0.340011] pci 0000:00:00.0: CNB20LE: parent res0: [mem 0xfc200000-0xfeafffff]
>> > [    0.344011] pci 0000:00:00.0: CNB20LE: parent res1: [mem 0xfc000000-0xfc0fffff pref]
>> > [    0.348010] pci 0000:00:00.0: CNB20LE: parent res2: (null)
>> > [    0.352010] pci 0000:00:00.0: CNB20LE: parent res3: (null)
>> > [    0.356009] pci 0000:00:00.0: CNB20LE: parent res4: (null)
>> > [    0.360009] pci 0000:00:00.0: CNB20LE: parent res5: (null)
>> > [    0.364074] pci 0000:00:00.0: calling quirk_resource_alignment+0x0/0x164
>> > [    0.368090] pci 0000:00:00.1: found [1166:0009] class 000600 header type 00
>> > [    0.372014] pci 0000:00:00.1: calling quirk_no_ata_d3+0x0/0x24
>> > [    0.376013] pci 0000:00:00.1: calling acpi_pm_check_graylist+0x0/0x2d
>> > [    0.380009] * The chipset may have PM-Timer Bug. Due to workarounds for a bug,
>> > [    0.380014] * this clock source is slow. If you are sure your timer does not have
>> > [    0.380018] * this bug, please use "acpi_pm_good" to disable the workaround
>> > [    0.384013] pci 0000:00:00.1: calling cnb20le_res+0x0/0x371
>> > [    0.388014] pci 0000:00:00.1: CNB20LE: busses: 1 to 1
>> > [    0.392014] pci 0000:00:00.1: CNB20LE: noPF 0xfeb0 0xfebf
>> > [    0.396010] pci 0000:00:00.1: CNB20LE: noPF (null)
>> > [    0.400014] pci 0000:00:00.1: CNB20LE: PF 0xfc10 0xfc1f
>> > [    0.404009] pci 0000:00:00.1: CNB20LE: PF (null)
>> > [    0.408014] pci 0000:00:00.1: CNB20LE: IO 0xe000 0xeffc
>> > [    0.412009] pci 0000:00:00.1: CNB20LE: IO (null)
>> > [    0.416011] pci 0000:00:00.1: CNB20LE: parent bus: f7885800 number 0 pri 0 sec 0
>> > [    0.420011] pci 0000:00:00.1: CNB20LE: parent res0: [mem 0xfc200000-0xfeafffff]
>> > [    0.424011] pci 0000:00:00.1: CNB20LE: parent res1: [mem 0xfc000000-0xfc0fffff pref]
>> > [    0.428010] pci 0000:00:00.1: CNB20LE: parent res2: (null)
>> > [    0.432010] pci 0000:00:00.1: CNB20LE: parent res3: (null)
>> > [    0.436009] pci 0000:00:00.1: CNB20LE: parent res4: (null)
>> > [    0.440009] pci 0000:00:00.1: CNB20LE: parent res5: (null)
>> >
>> > And the revised patch, with debugging output added:
>> >
>> > From 8fbbf7f99e58a1c95580783a1ba7dd6ebdd45187 Mon Sep 17 00:00:00 2001
>> > From: Ira W. Snyder <iws@xxxxxxxxxxxxxxxx>
>> > Date: Mon, 16 Nov 2009 08:42:39 -0800
>> > Subject: [PATCH] PCI: read memory ranges out of Broadcom CNB20LE host bridge
>> >
>> > Read the memory ranges behind the Broadcom CNB20LE host bridge out of the
>> > hardware. This allows PCI hotplugging to work, since we know which memory
>> > range to allocate PCI BAR's from.
>> >
>> > Signed-off-by: Ira W. Snyder <iws@xxxxxxxxxxxxxxxx>
>> > ---
>> >  arch/x86/pci/Makefile       |    1 +
>> >  arch/x86/pci/broadcom_bus.c |   75 +++++++++++++++++++++++++++++++++++++++++++
>> >  2 files changed, 76 insertions(+), 0 deletions(-)
>> >  create mode 100644 arch/x86/pci/broadcom_bus.c
>> >
>> > diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile
>> > index d8a0a62..f762c05 100644
>> > --- a/arch/x86/pci/Makefile
>> > +++ b/arch/x86/pci/Makefile
>> > @@ -15,6 +15,7 @@ obj-$(CONFIG_X86_NUMAQ)               += numaq_32.o
>> >
>> >  obj-y                          += common.o early.o
>> >  obj-y                          += amd_bus.o
>> > +obj-y                          += broadcom_bus.o
>> >  obj-$(CONFIG_X86_64)           += intel_bus.o
>> >
>> >  ifeq ($(CONFIG_PCI_DEBUG),y)
>> > diff --git a/arch/x86/pci/broadcom_bus.c b/arch/x86/pci/broadcom_bus.c
>> > new file mode 100644
>> > index 0000000..5d56e23
>> > --- /dev/null
>> > +++ b/arch/x86/pci/broadcom_bus.c
>> > @@ -0,0 +1,75 @@
>> > +/*
>> > + * Read address ranges from a Broadcom CNB20LE Host Bridge
>> > + *
>> > + * Copyright (c) 2009 Ira W. Snyder <iws@xxxxxxxxxxxxxxxx>
>> > + *
>> > + * This file is licensed under the terms of the GNU General Public License
>> > + * version 2. This program is licensed "as is" without any warranty of any
>> > + * kind, whether express or implied.
>> > + */
>> > +
>> > +#define DEBUG 1
>> > +
>> > +#include <linux/delay.h>
>> > +#include <linux/dmi.h>
>> > +#include <linux/pci.h>
>> > +#include <linux/init.h>
>> > +#include <asm/pci_x86.h>
>> > +
>> > +#include "bus_numa.h"
>> > +
>> > +static int res_num = 0;
>>
>> why?
>>
>
> So I know how many resources I've consumed so far. The same dev->bus
> structure is given to me for both host bridges. I wanted to keep track
> of which dev->bus->resource[i] files I'd filled in so far.
>
> Should I try and figure out which ones are not used so far?

please check the one in intel_bus.c

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux