Re: [RFC] PCI: Fix kernel panic of root-port-less PCIe enum due to ASPM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 06, 2016 at 08:13:58AM -0500, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> Hi Serge,
> 
> On Thu, Oct 06, 2016 at 12:34:15PM +0300, Serge Semin wrote:
> > Hello linux folks,
> > 
> >     Sometime ago I discovered a kernel panic popping up when PCI subsystem was
> > trying to enumerate PCI express bus with ASPM service enabled. Here it is:
> > 
> > [    5.089667] CPU 0 Unable to handle kernel paging request at virtual
> > address 00000060, epc == 80317004, ra == 80316ac8
> > [    5.120952] Oops[#1]:
> >           ...
> > [    5.528438] Call Trace:
> > [    5.535640] [<80317004>] pcie_aspm_init_link_state+0x6c0/0x814
> > [    5.552843] [<80300c44>] pci_scan_slot+0x140/0x148
> > [    5.566957] [<80301dcc>] pci_scan_child_bus+0x50/0x1b0
> > [    5.582096] [<80301944>] pci_scan_bridge+0x25c/0x694
> > [    5.596724] [<80301e78>] pci_scan_child_bus+0xfc/0x1b0
> > [    5.611862] [<80301944>] pci_scan_bridge+0x25c/0x694
> > [    5.626488] [<80301e78>] pci_scan_child_bus+0xfc/0x1b0
> > [    5.641628] [<8030215c>] pci_scan_root_bus+0x64/0x124
> > [    5.656528] [<804ca298>] pcibios_scanbus+0xa8/0x188
> > 
> >     I more than sure you are familiar with the issue, since I've found the
> > mailing discussion: "PCI: avoid NULL deref in alloc_pcie_link_state"
> > https://patchwork.kernel.org/patch/2751651/
> > https://bugzilla.kernel.org/show_bug.cgi?id=60111
> > 
> >     You closed the bugzilla ticket with the next statement:
> > "I'm closing this as invalid because the simulated machine where the problem
> > occurs has an invalid PCIe topology (an Upstream Port with no Downstream Port
> > or Root Port above it).  As far as I know, there is no valid topology, e.g.,
> > a real hardware machine in the field, that would cause this failure."
> > 
> >     I'm strongly disagree with it, since I've got at least two hardware with
> > PCIe-bus hierarchy as described in the mailing list. One of them is based on
> > Cavium Octeon III CN7020. Here is a ASCII-diagram of PCIe-bus:
> 
> Thanks for this information.  I reopened that bugzilla; can you attach
> complete dmesg logs and "lspci -vv" output for your systems?  As I
> mentioned in comment #4, I'm completely open to fixing this.  My
> objections at the time were (1) there was no known hardware that could
> trigger the problem, and (2) the proposed fix was ugly and prone to
> future breakage.  Since we now have real systems that trip over this,
> we need to revisit it.
> 
> Bjorn
> 

Done. Welcome back to the bugzilla thread.

-Serge

> > -+-[0000:01]---00.0-[02-06]--+-02.0-[03-05]--+-00.0-[04-05]----00.0-[05]--
> >  |                           |               \-00.1  Device [111d:808f]
> >  |                           \-04.0-[06]----00.0  Device [126f:0750]
> >  \-[0000:00]-
> > 
> > where 01:00.0 is an Upstream port of IDT PCIe-swtich.
> > / # /usr/local/sbin/lspci -v -s 01:00.0
> > 01:00.0 Class 0604: Device 111d:8061
> >         Flags: bus master, fast devsel, latency 0
> >         Memory at <unassigned> (32-bit, non-prefetchable) [size=2]
> >         Memory at <unassigned> (32-bit, non-prefetchable) [size=2]
> >         Bus: primary=01, secondary=02, subordinate=06, sec-latency=0
> >         Memory behind bridge: 08000000-0dffffff
> >         Expansion ROM at <unassigned> [disabled] [size=2]
> >         Capabilities: [40] Express Upstream Port, MSI 00
> >         Capabilities: [c0] Power Management version 3
> >         Capabilities: [100] Advanced Error Reporting
> >         Capabilities: [200] Virtual Channel
> >         Kernel driver in use: pcieport
> > 
> > As you can see PCI-bus hierarchy doesn't have root port and the very first
> > upstream port is directly connected to Host-PCIe bridge of MCU, which of
> > course is not listed by the lspci utility.
> > 
> > Despite of Radim Kr?má?, who suggested a fix, which would de-facto just
> > turned ASPM off, I found a quick solution, which disabled ASPM only in 
> > the first link (Host-PCIe=>Upstream port) of PCIe-bus for such hierarchy.
> > ASPM for other PCIe-bus topologies shall work the way it was.
> > 
> > I hope the fix will be helpful.
> > Thanks,
> > 
> > =============================
> > Serge V. Semin
> > Leading Programmer
> > Embedded SW development group
> > T-platforms
> > =============================
> > 
> > Signed-off-by: Serge Semin <fancer.lancer@xxxxxxxxx>
> > 
> > ---
> >  drivers/pci/pcie/aspm.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index 0ec649d..a9295f29 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -522,7 +522,8 @@ static struct pcie_link_state *alloc_pcie_link_state(struct pci_dev *pdev)
> >  	INIT_LIST_HEAD(&link->children);
> >  	INIT_LIST_HEAD(&link->link);
> >  	link->pdev = pdev;
> > -	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) {
> > +	if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
> > +	    (!pci_is_root_bus(pdev->bus->parent))) {
> >  		struct pcie_link_state *parent;
> >  		parent = pdev->bus->parent->self->link_state;
> >  		if (!parent) {
> > -- 
> > 2.6.6
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux