Re: [PATCH] PCI: fix kernel oops on bridge rmoval

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Kenji Kaneshige wrote:
Alex Chiang wrote:
* Kenji Kaneshige <kaneshige.kenji@xxxxxxxxxxxxxx>:
Thank you very much for testing.

We still have similar kernel oops (see below) with ACPI pci slot
detection driver. I guess the same problem would also occur with
acpiphp though I've not tried yet. I don't look at Trent's bus
notifier approach yet, but I think we need something like this to
fix this problem.

Here are steps to reproduce and kernel oops message.

* Steps to reproduce

(1) Load ACPI pci slot detection driver
(2) Remove the parent bridge of the slot
(3) Unload ACPI pci slot detection driver

Thanks for the report.

I believe this patch will fix the case for bridges. I haven't
tested what happens if we remove an endpoint yet though.

Can you try this?

Thanks.

/ac

---
commit 557ce38e78cf06ce16aefcf273051ea0bac3d35c
Author: Alex Chiang <achiang@xxxxxx>
Date:   Thu Mar 26 15:36:34 2009 -0600

PCI: pci_create_slot / pci_destroy_slot need to grab reference to parent bus If a logical hotunplug (remove) is performed on a PCI bridge claimed by a hotplug or slot detection driver, and then the hotplug/detection module
    is unloaded, we will encounter an oops:
        Call Trace:
     [<a000000100039bc0>] die+0x1c0/0x2c0
sp=e0000005062ff9e0 bsp=e0000005062f13b0
     [<a000000100039d00>] die_if_kernel+0x40/0x60
sp=e0000005062ff9e0 bsp=e0000005062f1380
     [<a00000010003b590>] ia64_fault+0x1230/0x1280
sp=e0000005062ff9e0 bsp=e0000005062f1300
     [<a00000010000c700>] ia64_native_leave_kernel+0x0/0x270
sp=e0000005062ffbf0 bsp=e0000005062f1300
     [<a0000001003988f0>] pci_slot_release+0x70/0x1c0
sp=e0000005062ffdc0 bsp=e0000005062f12b0
     [<a0000001003694d0>] kobject_release+0x4f0/0x5e0
sp=e0000005062ffdc0 bsp=e0000005062f1270
     [<a00000010036b490>] kref_put+0xd0/0x100
sp=e0000005062ffdc0 bsp=e0000005062f1248
     [<a000000100368650>] kobject_put+0x90/0xc0
sp=e0000005062ffdc0 bsp=e0000005062f1220
     [<a000000100399260>] pci_destroy_slot+0xa0/0xe0
sp=e0000005062ffdc0 bsp=e0000005062f11f0
     [<a0000002044d2c70>] pci_hp_deregister+0x510/0x560 [pci_hotplug]
sp=e0000005062ffdc0 bsp=e0000005062f11a8 [<a000000205d71aa0>] acpiphp_unregister_hotplug_slot+0x80/0x100 [acpiphp] sp=e0000005062ffdc0 bsp=e0000005062f1180
     [<a000000205d73d40>] cleanup_bridge+0x3a0/0x4c0 [acpiphp]
sp=e0000005062ffdc0 bsp=e0000005062f1128
     [<a000000205d73ee0>] cleanup_p2p_bridge+0x80/0xc0 [acpiphp]
sp=e0000005062ffdc0 bsp=e0000005062f1108
     [<a0000001003ed200>] acpi_ns_walk_namespace+0x160/0x2e0
sp=e0000005062ffdc0 bsp=e0000005062f1098
     [<a0000001003e8850>] acpi_walk_namespace+0x90/0xe0
sp=e0000005062ffdc0 bsp=e0000005062f1048
     [<a000000205d73f70>] remove_bridge+0x50/0xe0 [acpiphp]
sp=e0000005062ffdc0 bsp=e0000005062f1028
     [<a000000100411590>] acpi_pci_unregister_driver+0x1f0/0x2a0
sp=e0000005062ffdc0 bsp=e0000005062f0fe8
     [<a000000205d759d0>] acpiphp_glue_exit+0x30/0x60 [acpiphp]
sp=e0000005062ffdc0 bsp=e0000005062f0fd0
     [<a000000205d77380>] acpiphp_exit+0x20/0x40 [acpiphp]
sp=e0000005062ffdc0 bsp=e0000005062f0fb8
     [<a0000001000e9310>] sys_delete_module+0x410/0x520
sp=e0000005062ffdc0 bsp=e0000005062f0f38
        This is because pci_slot_release will access the parent PCI bus,
    which has already been released by the user's prior hot unplug.
        The solution is for pci_create_slot to grab a reference on the
    parent PCI bus (and pci_destroy_slot to put the reference). This
    will prevent the parent from release while the hotplug or slot
    detection driver is loaded.
        Reported-by: Kenji Kaneshige <kaneshige.kenji@xxxxxxxxxxxxxx>
    Signed-off-by: Alex Chiang <achiang@xxxxxx>

diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 2118944..459d6a2 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -248,6 +248,7 @@ placeholder:
         if (PCI_SLOT(dev->devfn) == slot_nr)
             dev->slot = slot;
+ get_device(&parent->dev);
     dev_dbg(&parent->dev, "dev %02x, created physical slot %s\n",
         slot_nr, pci_slot_name(slot));
@@ -302,6 +303,7 @@ void pci_destroy_slot(struct pci_slot *slot)
         slot->number, atomic_read(&slot->kobj.kref.refcount) - 1);
down_write(&pci_bus_sem);
+    put_device(&slot->bus->dev);
     kobject_put(&slot->kobj);
     up_write(&pci_bus_sem);
 }


I've not tried your patch yet, but I don't think it works because
pci_create_slot() can be executed by some hotplug drivers (pciehp,
shpchp, ...) before parent->dev is initialized.

Anyway, I'll try it and report the result as soon as possible.

Thanks,
Kenji Kaneshige



I tried your patch. The result is that the patch doesn't fix anything
and introduces new kernel oops (see below). I can reproduce the new
kernel oops by the following steps.

(1) Load pciehp
(2) Remove parent bridge of the slot
(3) Add parent bridge of the slot

I think the reason why your patch doesn't fix the problem is the place
of put_device() is wrong. It needs to be placed after kobject_put(),
doesn't it?

And the reason why your patch introduce the new kernel oops is
get_device(&parent->dev) is executed before parent->dev is initialized,
as I mentioned in the previous e-mail.

[  144.362098] ------------[ cut here ]------------
[  144.362239] WARNING: at lib/kref.c:43 kref_get+0x2f/0x40()
[  144.362382] Hardware name: PRIMERGY
[  144.362524] Modules linked in: pciehp ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod sbs sbshc pci_slot battery ac parport_pc lp parport mptspi mptscsih mptbase scsi_transport_spi e1000e sg sr_mod cdrom button serio_raw i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
[  144.365747] Pid: 68, comm: work_on_cpu/0 Not tainted 2.6.29-rc8-kk #3
[  144.365892] Call Trace:
[  144.366039]  [<ffffffff8024507c>] warn_slowpath+0xdc/0x110
[  144.366185]  [<ffffffff8033ef9b>] ? sysfs_add_file_mode+0x6b/0xa0
[  144.366331]  [<ffffffff8033efdc>] ? sysfs_add_file+0xc/0x10
[  144.366475]  [<ffffffff8033f00b>] ? sysfs_create_file+0x2b/0x40
[  144.366622]  [<ffffffff803a7bf1>] ? kobject_add_internal+0x141/0x250
[  144.366769]  [<ffffffff803a7e0d>] ? kobject_add_varg+0x5d/0x70
[  144.366917]  [<ffffffff80567466>] ? _spin_unlock+0x26/0x30
[  144.367070]  [<ffffffff803a8353>] ? kset_find_obj+0x23/0x90
[  144.367213]  [<ffffffff803a8395>] ? kset_find_obj+0x65/0x90
[  144.367355]  [<ffffffff803a8d5f>] kref_get+0x2f/0x40
[  144.367497]  [<ffffffff803a7a9a>] kobject_get+0x1a/0x30
[  144.367641]  [<ffffffff8043df27>] get_device+0x17/0x20
[  144.367784]  [<ffffffff803bef5f>] pci_create_slot+0x1af/0x280
[  144.367929]  [<ffffffff803c75cf>] pci_hp_register+0x8f/0x320
[  144.368087]  [<ffffffffa031aa70>] pciehp_probe+0xb0/0x4d0 [pciehp]
[  144.368234]  [<ffffffff803c5354>] pcie_port_probe_service+0x54/0x90
[  144.368381]  [<ffffffff804421de>] driver_probe_device+0xbe/0x2e0
[  144.368526]  [<ffffffff804424a0>] ? __device_attach+0x0/0x10
[  144.368670]  [<ffffffff804424a9>] __device_attach+0x9/0x10
[  144.368813]  [<ffffffff80440ee8>] bus_for_each_drv+0x68/0x90
[  144.368957]  [<ffffffff80442578>] device_attach+0x88/0x90
[  144.369110]  [<ffffffff80440e5b>] bus_attach_device+0x5b/0x80
[  144.369255]  [<ffffffff8043ea0a>] device_add+0x40a/0x690
[  144.369399]  [<ffffffff8043eca9>] device_register+0x19/0x20
[  144.369544]  [<ffffffff803c51ec>] pcie_port_device_register+0x3fc/0x510
[  144.369694]  [<ffffffff80257410>] ? do_work_for_cpu+0x0/0x20
[  144.369839]  [<ffffffff80257410>] ? do_work_for_cpu+0x0/0x20
[  144.369985]  [<ffffffff80554fc5>] pcie_portdrv_probe+0x35/0xa0
[  144.370140]  [<ffffffff803bf232>] local_pci_probe+0x12/0x20
[  144.370283]  [<ffffffff80257423>] do_work_for_cpu+0x13/0x20
[  144.371678]  [<ffffffff80257559>] ? run_workqueue+0x19/0x230
[  144.371822]  [<ffffffff8025769a>] run_workqueue+0x15a/0x230
[  144.371965]  [<ffffffff80257648>] ? run_workqueue+0x108/0x230
[  144.372114]  [<ffffffff8025846f>] worker_thread+0x9f/0x100
[  144.372258]  [<ffffffff8025bce0>] ? autoremove_wake_function+0x0/0x40
[  144.372404]  [<ffffffff802583d0>] ? worker_thread+0x0/0x100
[  144.372547]  [<ffffffff8025b89d>] kthread+0x4d/0x80
[  144.372689]  [<ffffffff8020d4ba>] child_rip+0xa/0x20
[  144.372831]  [<ffffffff8020cebc>] ? restore_args+0x0/0x30
[  144.372975]  [<ffffffff8025b850>] ? kthread+0x0/0x80
[  144.373126]  [<ffffffff8020d4b0>] ? child_rip+0x0/0x20
[  144.373269] ---[ end trace 43bc571356db47af ]---

Thanks,
Kenji Kaneshige

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux