Re: [PATCH] PCI: fix kernel oops on bridge rmoval

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alex Chiang wrote:
> * Kenji Kaneshige <kaneshige.kenji@xxxxxxxxxxxxxx>:
>> Hi,
>>
>> I encountered the kernel oops when I tried bridge removal using
>> Alex's logical hotplug interface on Jesse's linux-next. I'm
>> attaching the patch to solve this problem. See the description
>> of the attached patch for details.
>>
>> This patch is against Jesse's linux-next.
>>
>> Thanks,
>> Kenji Kaneshige
>>
>>
>> Fix the following kernel oops problem that happens when removing PCI
>> bridge with pciehp loaded. It should also occur with other hotplug
>> driver that is implemented as a bridge's driver.
>>
>> [  459.997257] pciehp 0000:2f:04.0:pcie24: unloading service driver pciehp
>> [  459.997495] general protection fault: 0000 [#1] SMP
>> [  459.997737] last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/0000:2e:00.0/0000:2f:04.0/remove
>> [  459.997964] CPU 4
>> [  459.998129] Modules linked in: pciehp ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod sbs sbshc battery ac parport_pc lp parport mptspi mptscsih mptbase scsi_transport_spi e1000e sg sr_mod cdrom button serio_raw i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
>> [  459.998129] Pid: 56, comm: events/4 Not tainted 2.6.29-rc8-kk #1 PRIMERGY
>> [  459.998129] RIP: 0010:[<ffffffff803bf047>]  [<ffffffff803bf047>] pci_slot_release+0x37/0x100
>> [  459.998129] RSP: 0018:ffff88083b3bf9e0  EFLAGS: 00010246
>> [  459.998129] RAX: ffff88083adc5158 RBX: ffff880836c1bc80 RCX: 6b6b6b6b6b6b6b6b
>> [  459.998129] RDX: 0000000000000000 RSI: ffffffff803a77f0 RDI: ffff880836c1bc48
>> [  459.998129] RBP: ffff88083b3bfa00 R08: 0000000000000002 R09: 0000000000000000
>> [  459.998129] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880836c1bc48
>> [  459.998129] R13: ffff880836c1bc20 R14: ffff880836c1bc48 R15: ffff880836d1ec38
>> [  459.998129] FS:  0000000000000000(0000) GS:ffff88083ccc3770(0000) knlGS:0000000000000000
>> [  459.998129] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> [  459.998129] CR2: 00007f1562f1d558 CR3: 0000000838090000 CR4: 00000000000006e0
>> [  459.998129] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  459.998129] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [  459.998129] Process events/4 (pid: 56, threadinfo ffff88083b3be000, task ffff88083b3b3e40)
>> [  459.998129] Stack:
>> [  459.998129]  ffff880836c1bc80 ffff880836c1bc48 ffffffff80793320 ffff88083b0d0960
>> [  459.998129]  ffff88083b3bfa30 ffffffff803a788a ffff880836c1bc80 ffffffff803a77f0
>> [  459.998129]  ffff880836c1bc20 ffff880836d1ec38 ffff88083b3bfa50 ffffffff803a8ce7
>> [  459.998129] Call Trace:
>> [  459.998129]  [<ffffffff803a788a>] kobject_release+0x9a/0x290
>> [  459.998129]  [<ffffffff803a77f0>] ? kobject_release+0x0/0x290
>> [  459.998129]  [<ffffffff803a8ce7>] kref_put+0x37/0x80
>> [  459.998129]  [<ffffffff803a76f7>] kobject_put+0x27/0x60
>> [  459.998129]  [<ffffffff803bebcc>] ? pci_destroy_slot+0x3c/0xc0
>> [  459.998129]  [<ffffffff803bebd5>] pci_destroy_slot+0x45/0xc0
>> [  459.998129]  [<ffffffff803c797d>] pci_hp_deregister+0x13d/0x210
>> [  459.998129]  [<ffffffffa031141d>] cleanup_slots+0x2d/0x80 [pciehp]
>> [  459.998129]  [<ffffffffa0311735>] pciehp_remove+0x15/0x30 [pciehp]
>> [  459.998129]  [<ffffffff803c4c99>] pcie_port_remove_service+0x69/0x90
>> [  459.998129]  [<ffffffff80441da9>] __device_release_driver+0x59/0x90
>> [  459.998129]  [<ffffffff80441edb>] device_release_driver+0x2b/0x40
>> [  459.998129]  [<ffffffff804419d6>] bus_remove_device+0xa6/0x120
>> [  459.998129]  [<ffffffff8043e46b>] device_del+0x12b/0x190
>> [  459.998129]  [<ffffffff803c4d90>] ? remove_iter+0x0/0x40
>> [  459.998129]  [<ffffffff8043e4f6>] device_unregister+0x26/0x70
>> [  459.998129]  [<ffffffff803c4dbf>] remove_iter+0x2f/0x40
>> [  459.998129]  [<ffffffff8043ddf3>] device_for_each_child+0x33/0x60
>> [  459.998129]  [<ffffffff8033ee30>] ? sysfs_schedule_callback_work+0x0/0x50
>> [  459.998129]  [<ffffffff803c4d30>] pcie_port_device_remove+0x30/0x80
>> [  459.998129]  [<ffffffff803c55a1>] pcie_portdrv_remove+0x11/0x20
>> [  459.998129]  [<ffffffff803bfeb2>] pci_device_remove+0x32/0x70
>> [  459.998129]  [<ffffffff80441da9>] __device_release_driver+0x59/0x90
>> [  459.998129]  [<ffffffff80441edb>] device_release_driver+0x2b/0x40
>> [  459.998129]  [<ffffffff804419d6>] bus_remove_device+0xa6/0x120
>> [  459.998129]  [<ffffffff8043e46b>] device_del+0x12b/0x190
>> [  459.998129]  [<ffffffff8043e4f6>] device_unregister+0x26/0x70
>> [  459.998129]  [<ffffffff803ba969>] pci_stop_dev+0x49/0x60
>> [  459.998129]  [<ffffffff803baab0>] pci_remove_bus_device+0x40/0xc0
>> [  459.998129]  [<ffffffff803c10d9>] remove_callback+0x29/0x40
>> [  459.998129]  [<ffffffff8033ee4f>] sysfs_schedule_callback_work+0x1f/0x50
>> [  459.998129]  [<ffffffff8025769a>] run_workqueue+0x15a/0x230
>> [  459.998129]  [<ffffffff80257648>] ? run_workqueue+0x108/0x230
>> [  459.998129]  [<ffffffff8025846f>] worker_thread+0x9f/0x100
>> [  459.998129]  [<ffffffff8025bce0>] ? autoremove_wake_function+0x0/0x40
>> [  459.998129]  [<ffffffff802583d0>] ? worker_thread+0x0/0x100
>> [  459.998129]  [<ffffffff8025b89d>] kthread+0x4d/0x80
>> [  459.998129]  [<ffffffff8020d4ba>] child_rip+0xa/0x20
>> [  459.998129]  [<ffffffff8020cebc>] ? restore_args+0x0/0x30
>> [  459.998129]  [<ffffffff8025b850>] ? kthread+0x0/0x80
>> [  459.998129]  [<ffffffff8020d4b0>] ? child_rip+0x0/0x20
>> [  459.998129] Code: 56 49 89 fe 41 55 4c 8d 6f d8 41 54 53 74 09 f6 05 b8 05 c7 00 08 75 72 49 8b 45 00 48 8b 48 28 eb 05 66 90 48 89 f1 49 8b 45 00 <48> 8b 31 48 83 c0 28 0f 18 0e 48 39 c1 74 1c 8b 41 38 41 0f b6
>> [  459.998129] RIP  [<ffffffff803bf047>] pci_slot_release+0x37/0x100
>> [  459.998129]  RSP <ffff88083b3bf9e0>
>> [  460.018595] ---[ end trace 5a08d2095374aedc ]---
>>
>> The pci_remove_bus_device() removes all buses and devices under the
>> bridge, and then remove the bridge. So the remove() callback of the
>                    removes
>> hotplug drivers implemented as a bridge's driver is executed after the
>> struct pci_bus of the bridge's secondary bus is removed. The remove()
>> callback of those driver deregister the slot using pci_destroy_slot(),
>                            unregisters
>> and slot's release callback refers the struct pci_bus that was already
>                               refers to the
>> freed. This is the cause of the kernel oops.
>>
>> This patch solves the problem by stop all the driver before removing
>> the bridge and its childe bus and devices.
>                      child
> 
> Good catch, thank you Kenji-san. I didn't see this because I
> didn't have hotplug drivers loaded during my testing. :-/
> 
> I was thinking originally of making the hotplug drivers register
> a bus notifier, similar to what Trent did with his new legacy
> fakephp which is probably still necessary, but this change is a
> good start.
> 
> I tested this patch on my machines and it works fine in the "no
> hotplug drivers" loaded case.
> 

Thank you very much for testing.

We still have similar kernel oops (see below) with ACPI pci slot
detection driver. I guess the same problem would also occur with
acpiphp though I've not tried yet. I don't look at Trent's bus
notifier approach yet, but I think we need something like this to
fix this problem.

Here are steps to reproduce and kernel oops message.

* Steps to reproduce

(1) Load ACPI pci slot detection driver
(2) Remove the parent bridge of the slot
(3) Unload ACPI pci slot detection driver

* Kernel oops message

[24462.585196] general protection fault: 0000 [#1] SMP
[24462.585306] last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/0000:2e:00.0/0000:2f:04.0/remove
[24462.585314] CPU 10
[24462.585314] Modules linked in: pci_slot(-) ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod sbs sbshc battery ac parport_pc lp parport mptspi mptscsih mptbase scsi_transport_spi e1000e sg sr_mod cdrom button serio_raw i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: pci_slot]
[24462.585314] Pid: 864, comm: rmmod Not tainted 2.6.29-rc8-kk #2 PRIMERGY
[24462.585314] RIP: 0010:[<ffffffff803bf047>]  [<ffffffff803bf047>] pci_slot_release+0x37/0x100
[24462.585314] RSP: 0018:ffff88081fdfddc8  EFLAGS: 00010246
[24462.585314] RAX: ffff880838d72688 RBX: ffff880824428380 RCX: 6b6b6b6b6b6b6b6b
[24462.585314] RDX: 0000000000000000 RSI: ffffffff803a77f0 RDI: ffff880824428348
[24462.585314] RBP: ffff88081fdfdde8 R08: 0000000000000002 R09: 0000000000000000
[24462.585314] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880824428348
[24462.585314] R13: ffff880824428320 R14: ffff880824428348 R15: 0000000000000880
[24462.585314] FS:  00007f0414b7c6e0(0000) GS:ffff88083b16caf0(0000) knlGS:0000000000000000
[24462.585314] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[24462.585314] CR2: 0000003f15474bd0 CR3: 000000081899b000 CR4: 00000000000006e0
[24462.585314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[24462.585314] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[24462.585314] Process rmmod (pid: 864, threadinfo ffff88081fdfc000, task ffff8808245e8000)
[24462.585314] Stack:
[24462.585314]  ffff880824428380 ffff880824428348 ffffffff80793320 ffff88083650c460
[24462.585314]  ffff88081fdfde18 ffffffff803a788a ffff880824428380 ffffffff803a77f0
[24462.585314]  ffff880824428320 ffff88081fdfdef8 ffff88081fdfde38 ffffffff803a8ce7
[24462.585314] Call Trace:
[24462.585314]  [<ffffffff803a788a>] kobject_release+0x9a/0x290
[24462.585314]  [<ffffffff803a77f0>] ? kobject_release+0x0/0x290
[24462.585314]  [<ffffffff803a8ce7>] kref_put+0x37/0x80
[24462.585314]  [<ffffffff803a76f7>] kobject_put+0x27/0x60
[24462.585314]  [<ffffffff803bebcc>] ? pci_destroy_slot+0x3c/0xc0
[24462.585314]  [<ffffffff803bebd5>] pci_destroy_slot+0x45/0xc0
[24462.585314]  [<ffffffffa000f05c>] acpi_pci_slot_remove+0x5c/0x91 [pci_slot]
[24462.585314]  [<ffffffff8040064b>] acpi_pci_unregister_driver+0x4b/0x62
[24462.585314]  [<ffffffffa000f5c8>] acpi_pci_slot_exit+0x10/0x12 [pci_slot]
[24462.585314]  [<ffffffff80276ce1>] sys_delete_module+0x161/0x250
[24462.585314]  [<ffffffff80567100>] ? trace_hardirqs_off_thunk+0x30/0x3c
[24462.585314]  [<ffffffff8029151a>] ? audit_syscall_entry+0x14a/0x1b0
[24462.585314]  [<ffffffff8020c3db>] system_call_fastpath+0x16/0x1b
[24462.585314] Code: 56 49 89 fe 41 55 4c 8d 6f d8 41 54 53 74 09 f6 05 b8 05 c7 00 08 75 72 49 8b 45 00 48 8b 48 28 eb 05 66 90 48 89 f1 49 8b 45 00 <48> 8b 31 48 83 c0 28 0f 18 0e 48 39 c1 74 1c 8b 41 38 41 0f b6
[24462.585314] RIP  [<ffffffff803bf047>] pci_slot_release+0x37/0x100
[24462.585314]  RSP <ffff88081fdfddc8>
[24462.592478] ---[ end trace e97c8f1f187fa2b0 ]---

Thanks,
Kenji Kaneshige


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux