Hi Jan, On Wed, Jan 04, 2017 at 12:41:36PM +0100, Jan Rüth wrote: > On 03/01/2017 21:53, Bjorn Helgaas wrote: > > On Tue, Jan 03, 2017 at 09:04:36AM +0100, Jan Rüth wrote: > >> This patch fixes a null pointer dereference during PCI bus enumeration > >> when ASPM is off. On an IBM x3850 8664 this bug causes the kernel to > >> halt, this behavior did not appear in 3.10, so this is a regression. > >> > >> pcie_aspm_sanity_check should only be called if ASPM is on. > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=187731 > > > >> Signed-off-by: Jan Rueth <rueth@xxxxxxxxxxxxxxxxxxxxx> > >> --- > >> drivers/pci/pcie/aspm.c | +++-- > >> 1 file changed, 3 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > >> index f981129..e758b56 100644 > >> --- a/drivers/pci/pcie/aspm.c > >> +++ b/drivers/pci/pcie/aspm.c > >> @@ -552,11 +552,12 @@ static struct pcie_link_state > >> *alloc_pcie_link_state(struct pci_dev *pdev) > >> void pcie_aspm_init_link_state(struct pci_dev *pdev) > >> { > >> struct pcie_link_state *link; > >> - int blacklist = !!pcie_aspm_sanity_check(pdev); > >> - > >> + int blacklist; > >> if (!aspm_support_enabled) > >> return; > >> > >> + blacklist = !!pcie_aspm_sanity_check(pdev); > >> + > >> if (pdev->link_state) > >> return; > > > > For some reason this doesn't apply for me (complains about a malformed > > patch), but I can apply it by hand easily enough. > > > > However, this path is a little obtuse, and I'd like to understand > > what's going on better. We currently have: > > > > pci_scan_slot > > if (bus->self && nr) > > pcie_aspm_init_link_state(bus->self) > > pcie_aspm_sanity_check > > list_for_each_entry(child, &pdev->subordinate->devices, ...) > > > > I assume the null pointer is pdev->subordinate, so maybe we're calling > > pcie_aspm_init_link_state() for a bridge where we weren't able to > > create a child bus ("pdev->subordinate")? > > At the time when I debugged it could trace it back to the loop but I > can't remember which part actually broke. But I can add some debug > output and crash it again. I'm pretty sure the null pointer would be pdev->subordinate, and your patch certainly avoids dereferencing that as long as you boot with "pcie_aspm=off". But I think if you boot with your patch but without "pcie_aspm=off", we'll probably dereference that null pointer again. The only way I can see to have a null pdev->subordinate pointer is via SR-IOV VF devices, and you should have some of those. But I think we should see the messages from pci_setup_device() for them, and I don't see them in the dmesg log in the bugzilla, so I'm still confused about this. Why are you using "pcie_aspm=off"? If it's due to Linux ASPM bugs, I want to fix those, too. If it's due to hardware problems on your platform, maybe we can add some sort of quirk to do this automatically. Can you please apply the patches below (you should be able to feed this whole email to "patch" at once), collect the complete dmesg logs (both with and without "pcie_aspm=off"), and attach them to the bugzilla? Bjorn commit 50deebe0077acdfafe08501b7da220d256743e36 Author: Jan Rüth <rueth@xxxxxxxxxxxxxxxxxxxxx> Date: Tue Jan 3 14:09:50 2017 -0600 PCI/ASPM: Avoid ASPM sanity checking when ASPM is off Previously we called pcie_aspm_sanity_check() even if ASPM was disabled, which is unnecessary. Delay the pcie_aspm_sanity_check() until we know that we should do some ASPM configuration. As a side effect, this avoids a null pointer dereference in pcie_aspm_sanity_check(), which we saw on IBM x3850 8664 (see bugzilla below). [bhelgaas: changelog, delay call a little more] Link: https://bugzilla.kernel.org/show_bug.cgi?id=187731 Signed-off-by: Jan Rüth <rueth@xxxxxxxxxxxxxxxxxxxxx> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> CC: stable@xxxxxxxxxxxxxxx diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index 17ac1dce3286..99a15b2e4228 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -565,7 +565,7 @@ static struct pcie_link_state *alloc_pcie_link_state(struct pci_dev *pdev) void pcie_aspm_init_link_state(struct pci_dev *pdev) { struct pcie_link_state *link; - int blacklist = !!pcie_aspm_sanity_check(pdev); + int blacklist; if (!aspm_support_enabled) return; @@ -586,6 +586,8 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev) pdev->bus->self) return; + blacklist = !!pcie_aspm_sanity_check(pdev); + down_read(&pci_bus_sem); if (list_empty(&pdev->subordinate->devices)) goto out; @@ -594,6 +596,7 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev) link = alloc_pcie_link_state(pdev); if (!link) goto unlock; + /* * Setup initial ASPM state. Note that we need to configure * upstream links also because capable state of them can be commit 8cd2325c4360c6aa680dd32d804ae8dcc8f8cc5b Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Date: Thu Jan 19 17:21:10 2017 -0600 aspm-debug diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index 99a15b2e4228..c8695965b1d0 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -567,6 +567,13 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev) struct pcie_link_state *link; int blacklist; + dev_info(&pdev->dev, "support %d PCIe %d link_state %d secondary %d subordinate %d\n", + aspm_support_enabled ? 1 : 0, + pci_is_pcie(pdev) ? 1 : 0, + pdev->link_state ? 1 : 0, + pdev->has_secondary_link ? 1 : 0, + pdev->subordinate ? 1 : 0); + if (!aspm_support_enabled) return; commit b89fa357a592615c00c1b98d8c06a3df1d185078 Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Date: Thu Jan 19 17:32:38 2017 -0600 sriov-debug diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 47227820406d..98be6d6b119f 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -129,6 +129,8 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset) if (!bus) goto failed; + dev_info(&dev->dev, "add VF %d on %s\n", id, dev_name(&bus->dev)); + virtfn = pci_alloc_dev(bus); if (!virtfn) goto failed0; @@ -145,6 +147,9 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset) virtfn->is_virtfn = 1; virtfn->multifunction = 0; + dev_info(&virtfn->dev, "new VF %d, subordinate %d\n", id, + virtfn->subordinate ? 1 : 0); + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { res = &dev->resource[i + PCI_IOV_RESOURCES]; if (!res->parent) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html