Kenji Kaneshige wrote:
Alex Chiang wrote:* Kenji Kaneshige <kaneshige.kenji@xxxxxxxxxxxxxx>:Thank you very much for testing. We still have similar kernel oops (see below) with ACPI pci slot detection driver. I guess the same problem would also occur with acpiphp though I've not tried yet. I don't look at Trent's bus notifier approach yet, but I think we need something like this to fix this problem. Here are steps to reproduce and kernel oops message. * Steps to reproduce (1) Load ACPI pci slot detection driver (2) Remove the parent bridge of the slot (3) Unload ACPI pci slot detection driverThanks for the report. I believe this patch will fix the case for bridges. I haven't tested what happens if we remove an endpoint yet though. Can you try this? Thanks. /ac --- commit 557ce38e78cf06ce16aefcf273051ea0bac3d35c Author: Alex Chiang <achiang@xxxxxx> Date: Thu Mar 26 15:36:34 2009 -0600PCI: pci_create_slot / pci_destroy_slot need to grab reference to parent bus If a logical hotunplug (remove) is performed on a PCI bridge claimed by a hotplug or slot detection driver, and then the hotplug/detection moduleis unloaded, we will encounter an oops: Call Trace: [<a000000100039bc0>] die+0x1c0/0x2c0sp=e0000005062ff9e0 bsp=e0000005062f13b0[<a000000100039d00>] die_if_kernel+0x40/0x60sp=e0000005062ff9e0 bsp=e0000005062f1380[<a00000010003b590>] ia64_fault+0x1230/0x1280sp=e0000005062ff9e0 bsp=e0000005062f1300[<a00000010000c700>] ia64_native_leave_kernel+0x0/0x270sp=e0000005062ffbf0 bsp=e0000005062f1300[<a0000001003988f0>] pci_slot_release+0x70/0x1c0sp=e0000005062ffdc0 bsp=e0000005062f12b0[<a0000001003694d0>] kobject_release+0x4f0/0x5e0sp=e0000005062ffdc0 bsp=e0000005062f1270[<a00000010036b490>] kref_put+0xd0/0x100sp=e0000005062ffdc0 bsp=e0000005062f1248[<a000000100368650>] kobject_put+0x90/0xc0sp=e0000005062ffdc0 bsp=e0000005062f1220[<a000000100399260>] pci_destroy_slot+0xa0/0xe0sp=e0000005062ffdc0 bsp=e0000005062f11f0[<a0000002044d2c70>] pci_hp_deregister+0x510/0x560 [pci_hotplug]sp=e0000005062ffdc0 bsp=e0000005062f11a8 [<a000000205d71aa0>] acpiphp_unregister_hotplug_slot+0x80/0x100 [acpiphp] sp=e0000005062ffdc0 bsp=e0000005062f1180[<a000000205d73d40>] cleanup_bridge+0x3a0/0x4c0 [acpiphp]sp=e0000005062ffdc0 bsp=e0000005062f1128[<a000000205d73ee0>] cleanup_p2p_bridge+0x80/0xc0 [acpiphp]sp=e0000005062ffdc0 bsp=e0000005062f1108[<a0000001003ed200>] acpi_ns_walk_namespace+0x160/0x2e0sp=e0000005062ffdc0 bsp=e0000005062f1098[<a0000001003e8850>] acpi_walk_namespace+0x90/0xe0sp=e0000005062ffdc0 bsp=e0000005062f1048[<a000000205d73f70>] remove_bridge+0x50/0xe0 [acpiphp]sp=e0000005062ffdc0 bsp=e0000005062f1028[<a000000100411590>] acpi_pci_unregister_driver+0x1f0/0x2a0sp=e0000005062ffdc0 bsp=e0000005062f0fe8[<a000000205d759d0>] acpiphp_glue_exit+0x30/0x60 [acpiphp]sp=e0000005062ffdc0 bsp=e0000005062f0fd0[<a000000205d77380>] acpiphp_exit+0x20/0x40 [acpiphp]sp=e0000005062ffdc0 bsp=e0000005062f0fb8[<a0000001000e9310>] sys_delete_module+0x410/0x520sp=e0000005062ffdc0 bsp=e0000005062f0f38This is because pci_slot_release will access the parent PCI bus, which has already been released by the user's prior hot unplug. The solution is for pci_create_slot to grab a reference on the parent PCI bus (and pci_destroy_slot to put the reference). This will prevent the parent from release while the hotplug or slot detection driver is loaded. Reported-by: Kenji Kaneshige <kaneshige.kenji@xxxxxxxxxxxxxx> Signed-off-by: Alex Chiang <achiang@xxxxxx> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c index 2118944..459d6a2 100644 --- a/drivers/pci/slot.c +++ b/drivers/pci/slot.c @@ -248,6 +248,7 @@ placeholder: if (PCI_SLOT(dev->devfn) == slot_nr) dev->slot = slot;+ get_device(&parent->dev);dev_dbg(&parent->dev, "dev %02x, created physical slot %s\n", slot_nr, pci_slot_name(slot));@@ -302,6 +303,7 @@ void pci_destroy_slot(struct pci_slot *slot)slot->number, atomic_read(&slot->kobj.kref.refcount) - 1);down_write(&pci_bus_sem);+ put_device(&slot->bus->dev); kobject_put(&slot->kobj); up_write(&pci_bus_sem); }I've not tried your patch yet, but I don't think it works because pci_create_slot() can be executed by some hotplug drivers (pciehp, shpchp, ...) before parent->dev is initialized. Anyway, I'll try it and report the result as soon as possible. Thanks, Kenji Kaneshige
I tried your patch. The result is that the patch doesn't fix anything and introduces new kernel oops (see below). I can reproduce the new kernel oops by the following steps. (1) Load pciehp (2) Remove parent bridge of the slot (3) Add parent bridge of the slot I think the reason why your patch doesn't fix the problem is the place of put_device() is wrong. It needs to be placed after kobject_put(), doesn't it? And the reason why your patch introduce the new kernel oops is get_device(&parent->dev) is executed before parent->dev is initialized, as I mentioned in the previous e-mail. [ 144.362098] ------------[ cut here ]------------ [ 144.362239] WARNING: at lib/kref.c:43 kref_get+0x2f/0x40() [ 144.362382] Hardware name: PRIMERGY [ 144.362524] Modules linked in: pciehp ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod sbs sbshc pci_slot battery ac parport_pc lp parport mptspi mptscsih mptbase scsi_transport_spi e1000e sg sr_mod cdrom button serio_raw i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] [ 144.365747] Pid: 68, comm: work_on_cpu/0 Not tainted 2.6.29-rc8-kk #3 [ 144.365892] Call Trace: [ 144.366039] [<ffffffff8024507c>] warn_slowpath+0xdc/0x110 [ 144.366185] [<ffffffff8033ef9b>] ? sysfs_add_file_mode+0x6b/0xa0 [ 144.366331] [<ffffffff8033efdc>] ? sysfs_add_file+0xc/0x10 [ 144.366475] [<ffffffff8033f00b>] ? sysfs_create_file+0x2b/0x40 [ 144.366622] [<ffffffff803a7bf1>] ? kobject_add_internal+0x141/0x250 [ 144.366769] [<ffffffff803a7e0d>] ? kobject_add_varg+0x5d/0x70 [ 144.366917] [<ffffffff80567466>] ? _spin_unlock+0x26/0x30 [ 144.367070] [<ffffffff803a8353>] ? kset_find_obj+0x23/0x90 [ 144.367213] [<ffffffff803a8395>] ? kset_find_obj+0x65/0x90 [ 144.367355] [<ffffffff803a8d5f>] kref_get+0x2f/0x40 [ 144.367497] [<ffffffff803a7a9a>] kobject_get+0x1a/0x30 [ 144.367641] [<ffffffff8043df27>] get_device+0x17/0x20 [ 144.367784] [<ffffffff803bef5f>] pci_create_slot+0x1af/0x280 [ 144.367929] [<ffffffff803c75cf>] pci_hp_register+0x8f/0x320 [ 144.368087] [<ffffffffa031aa70>] pciehp_probe+0xb0/0x4d0 [pciehp] [ 144.368234] [<ffffffff803c5354>] pcie_port_probe_service+0x54/0x90 [ 144.368381] [<ffffffff804421de>] driver_probe_device+0xbe/0x2e0 [ 144.368526] [<ffffffff804424a0>] ? __device_attach+0x0/0x10 [ 144.368670] [<ffffffff804424a9>] __device_attach+0x9/0x10 [ 144.368813] [<ffffffff80440ee8>] bus_for_each_drv+0x68/0x90 [ 144.368957] [<ffffffff80442578>] device_attach+0x88/0x90 [ 144.369110] [<ffffffff80440e5b>] bus_attach_device+0x5b/0x80 [ 144.369255] [<ffffffff8043ea0a>] device_add+0x40a/0x690 [ 144.369399] [<ffffffff8043eca9>] device_register+0x19/0x20 [ 144.369544] [<ffffffff803c51ec>] pcie_port_device_register+0x3fc/0x510 [ 144.369694] [<ffffffff80257410>] ? do_work_for_cpu+0x0/0x20 [ 144.369839] [<ffffffff80257410>] ? do_work_for_cpu+0x0/0x20 [ 144.369985] [<ffffffff80554fc5>] pcie_portdrv_probe+0x35/0xa0 [ 144.370140] [<ffffffff803bf232>] local_pci_probe+0x12/0x20 [ 144.370283] [<ffffffff80257423>] do_work_for_cpu+0x13/0x20 [ 144.371678] [<ffffffff80257559>] ? run_workqueue+0x19/0x230 [ 144.371822] [<ffffffff8025769a>] run_workqueue+0x15a/0x230 [ 144.371965] [<ffffffff80257648>] ? run_workqueue+0x108/0x230 [ 144.372114] [<ffffffff8025846f>] worker_thread+0x9f/0x100 [ 144.372258] [<ffffffff8025bce0>] ? autoremove_wake_function+0x0/0x40 [ 144.372404] [<ffffffff802583d0>] ? worker_thread+0x0/0x100 [ 144.372547] [<ffffffff8025b89d>] kthread+0x4d/0x80 [ 144.372689] [<ffffffff8020d4ba>] child_rip+0xa/0x20 [ 144.372831] [<ffffffff8020cebc>] ? restore_args+0x0/0x30 [ 144.372975] [<ffffffff8025b850>] ? kthread+0x0/0x80 [ 144.373126] [<ffffffff8020d4b0>] ? child_rip+0x0/0x20 [ 144.373269] ---[ end trace 43bc571356db47af ]--- Thanks, Kenji Kaneshige -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html