Re: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/19/2013 03:43 AM, Yinghai Lu wrote:

> On Thu, Apr 18, 2013 at 1:53 AM, Gu Zheng <guz.fnst@xxxxxxxxxxxxxx> wrote:
>> From 0c45a7fca2276123d0b926a22ea69158dad8ab9c Mon Sep 17 00:00:00 2001
>> From: Gu Zheng <guz.fnst@xxxxxxxxxxxxxx>
>> Date: Thu, 18 Apr 2013 17:43:53 +0900
>> Subject: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
>>
>> This patch is used to fix the panic issue of parallel device removal on different pci hierarchy,
>> refer to https://bugzilla.kernel.org/show_bug.cgi?id=54411.
>>
>> [  418.775140]  ioatdma i7core_edac edac_core sg e1000e igb dca ptp pps_core
>> sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas
>> scsi_mod
>> [  418.946462] CPU 4
>> [  418.968377] Pid: 512, comm: kworker/u:2 Tainted: G        W    3.8.0 #2
>> FUJITSU-SV PRIMEQUEST 1800E/SB
>> [  419.081763] RIP: 0010:[<ffffffff8137972e>]  [<ffffffff8137972e>]
>> pci_bus_read_config_word+0x5e/0x90
>> [  419.189965] RSP: 0018:ffff8807b0a37c08  EFLAGS: 00010046
>> [  419.253409] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8807bb4a1290 RCX:
>> 0000000000000002
>> [  419.338658] RDX: 00000000000000c4 RSI: 0000000000000008 RDI:
>> ffff8807bb4a1290
>> [  419.423925] RBP: ffff8807b0a37c48 R08: ffff8807b0a37c24 R09:
>> 6db5c22da55960d0
>> [  419.509175] R10: 0000000000000000 R11: 000000000003ecd0 R12:
>> ffff8807b0a37c66
>> [  419.594425] R13: 0000000000000282 R14: ffffffff82126d40 R15:
>> 0000000000000000
>> [  419.679675] FS:  0000000000000000(0000) GS:ffff8807c2200000(0000)
>> knlGS:0000000000000000
>> [  419.776343] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [  419.844981] CR2: 00007ffa898a54f8 CR3: 0000000001c0c000 CR4:
>> 00000000000007e0
>> [  419.930236] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [  420.015484] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [  420.100736] Process kworker/u:2 (pid: 512, threadinfo ffff8807b0a36000, task
>> ffff8807b30bcd00)
>> [  420.203632] Stack:
>> [  420.227623]  ffff8807000000c4 ffffffff00000008 ffffffff813851ef
>> 0000000000992000
>> [  420.316421]  ffff8807b0a37c98 ffff8807bb49b3d8 0000000000000000
>> 0000000000000000
>> [  420.405233]  ffff8807b0a37c88 ffffffff8138044b ffff8807b0a37c88
>> 0000000000000246
>> [  420.494137] Call Trace:
>> [  420.523326]  [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
>> [  420.591984]  [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
>> [  420.658545]  [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
>> [  420.729259]  [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
>> [  420.811392]  [<ffffffff813851fb>] remove_callback+0x2b/0x40
>> [  420.877955]  [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
>> [  420.958017]  [<ffffffff810919ae>] process_one_work+0x20e/0x5c0
>> [  421.027691]  [<ffffffff8109193f>] ? process_one_work+0x19f/0x5c0
>> [  421.099441]  [<ffffffff81257a30>] ? sysfs_schedule_callback+0x210/0x210
>> [  421.178461]  [<ffffffff81093a4e>] worker_thread+0x12e/0x370
>> [  421.245020]  [<ffffffff81093920>] ? manage_workers+0x180/0x180
>> [  421.314697]  [<ffffffff81099b8e>] kthread+0xee/0x100
>> [  421.373992]  [<ffffffff810e0f09>] ? __lock_release+0x129/0x190
>> [  421.443671]  [<ffffffff81099aa0>] ? __init_kthread_worker+0x70/0x70
>> [  421.518544]  [<ffffffff816b2dac>] ret_from_fork+0x7c/0xb0
>> [  421.583031]  [<ffffffff81099aa0>] ? __init_kthread_worker+0x70/0x70
>> [  421.657894] Code: 89 75 c8 c7 45 dc 00 00 00 00 e8 4e ef 32 00 49 89 c5 48
>> 8b 83 b8 00 00 00 4c 8d 45 dc b9 02 00 00 00 8b 55 c0 8b 75 c8 48 89 df <ff> 10
>> 8b 55 dc 4c 89 ee 48 c7 c7 c0 67 cb 81 89 45 c8 66 41 89
>> [  421.890306] RIP  [<ffffffff8137972e>] pci_bus_read_config_word+0x5e/0x90
>> [  421.970475]  RSP <ffff8807b0a37c08>
>> [  422.012121] ---[ end trace 403f76cf31f1bcb1 ]---
>>
>> It is easy to reproduce with the following script:
>> echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
>> /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>
>> The 1a:01.0 device is downstream from the 10:00.0 bridge.
>>
>> The sysfs interface remove_store() uses device_schedule_callback() to schedule
>> the remove for later. What's happening is that we schedule
>> remove_callback() for both devices before 10:00.0 has been removed,
>> like this:
>>
>>     # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
>>     remove_store  # for 10:00.0
>>       device_schedule_callback(10:00.0, remove_callback)
>>         sysfs_schedule_callback
>>           kobject_get
>>           queue_work
>>     # echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>     remove_store  # for 1a:01.0
>>       device_schedule_callback(1a:01.0, remove_callback)
>>         sysfs_schedule_callback
>>           kobject_get
>>           queue_work
>>
>> Later, we run the callbacks, starting with 10:00.0.  This calls
>> remove_callback() to perform the remove:
>>
>>     remove_callback(10:00.0)
>>       mutex_lock(&pci_remove_rescan_mutex)
>>       pci_stop_and_remove_bus_device(pdev)
>>       mutex_unlock(&pci_remove_rescan_mutex)
>>
>> This will stop and remove the subtree below 10:00.0, but it does not
>> actually free the pci_dev for 1a:01.0 because we increased its ref
>> count in sysfs_schedule_callback.  So after completing
>> remove_callback(10:00.0), we run the second callback for 1a:01.0.
>>
>> But the PCI core did this removal wrong. It deallocated the struct pci_bus
>> for bus 0000:1a too soon.
>>
>> So we take a reference on the bus object when capturing the struct pci_bus pointer,
>> in order to keep it valid before its downstream devices' removal routines complete.
>>
>> Gu Zheng (3):
>>   PCI: take a reference on the bus object when we capture the struct
>>     pci_bus pointer
>>   PCI: rename alloc_pci_dev() to pci_alloc_dev()
>>   PCI: Move the acquiring the reference of pci bus inside
>>     pci_alloc_bus()
>>
>>  arch/powerpc/kernel/pci_of_scan.c |    3 +--
>>  drivers/char/agp/alpha-agp.c      |    2 +-
>>  drivers/char/agp/parisc-agp.c     |    2 +-
>>  drivers/pci/bus.c                 |   14 ++++++++++++++
>>  drivers/pci/iov.c                 |    6 ++++--
>>  drivers/pci/probe.c               |   10 ++++++----
>>  drivers/scsi/megaraid.c           |    2 +-
>>  include/linux/pci.h               |    5 ++++-
>>  8 files changed, 32 insertions(+), 12 deletions(-)
> 
> No. that is wrong.
> 
> I can not find where that reference count get reduced!
> in the my test patches that I sent, it does have have that in remove path.
> 
> What is reason for you to remove that line?
> 
> After put the line back, your test still can pass?

Hi Yinghai,
	The remove test script still can pass if we reduce the reference of pci_bus when destroying
the pci_dev, so does your patch, but a list_del corruption warning occurs in the bottom half routine:
------------[ cut here ]------------
WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
Hardware name: PRIMEQUEST 1800E
list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
Modules linked in: shpchp ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg i2c_i801 lpc_ich mfd_core i7core_edac edac_core ioatdma e1000e igb dca i2c_algo_bit i2c_core ptp pps_core ext4(F) mbcache(F) jbd2(F) sd_mod(F) crc_t10dif(F) megaraid_sas(F) mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F)
Pid: 6, comm: kworker/u:0 Tainted: GF       W    3.9.0-rc7-pci-remove-test+ #59
Call Trace:
 [<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff81280b13>] __list_del_entry+0x63/0xd0
 [<ffffffff81280b91>] list_del+0x11/0x40
 [<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
 [<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
 [<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
 [<ffffffff8129fc89>] remove_callback+0x29/0x40
 [<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
 [<ffffffff81073d85>] process_one_work+0x185/0x3f0
 [<ffffffff810763e9>] worker_thread+0x119/0x380
 [<ffffffff810762d0>] ? manage_workers+0x180/0x180
 [<ffffffff8107b6ae>] kthread+0xce/0xe0
 [<ffffffff8107b5e0>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff815724ac>] ret_from_fork+0x7c/0xb0
 [<ffffffff8107b5e0>] ? kthread_freezable_should_stop+0x70/0x70
---[ end trace 9c05e382f933a515 ]---

Thanks,
Gu

> 
> Yinghai


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux