Re: [RFC] How to fix an async scan - 'rmmod --wait' race?

Bart Van Assche <bvanassche@xxxxxxx> · Thu, 17 May 2012 08:55:07 +0000

On 05/17/12 08:42, James Bottomley wrote:

> On Wed, 2012-04-18 at 18:48 +0200, Tomas Henzl wrote:
>> On 04/12/2012 02:48 PM, Tomas Henzl wrote:
>>> Hi,
>>>
>>> the first patch waited in scsi_remove_host was until the scan thread ends, some flags were set
>>> so the thread could be aborted prematurely. 
>>>
>>> This patch uses a try_module_get to lock the module and prevents a rmmod while it's locked.
>>> The disadvantage is that in the protection we use again the same host template (shost->hostt->module;),
>>> and when this were not valid I think and oops in try_module_get will follow. It seems to me
>>> that the LLDs use scsi_scan_host only in the initial attach function so this is safe now, but ..
>> Actually I did some testing and it is not so stable as it should be - after 'modprobe mpt2sas && rmmod --wait mpt2sas'
>> I sometimes see:
>>
>> [  215.126197] mpt2sas version 12.100.00.00 loaded                                                                                                                                         
>> [  215.127509] scsi8 : Fusion MPT SAS Host                                                                                                                                                 
>> [  215.128058] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (6096496 kB)                                                                                                   
>> [  215.128167] mpt2sas 0000:03:00.0: irq 69 for MSI/MSI-X
>> [  215.128190] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 69
>> [  215.128192] mpt2sas0: iomem(0x00000000f7040000), mapped(0xffffc90006010000), size(16384)
>> [  215.128194] mpt2sas0: ioport(0x000000000000d000), size(256)
>> [  215.241249] mpt2sas0: Allocated physical memory: size(3988 kB)
>> [  215.241253] mpt2sas0: Current Controller Queue Depth(1753), Max Controller Queue Depth(2000)
>> [  215.241256] mpt2sas0: Scatter Gather Elements per IO(128)
>> [  215.300281] mpt2sas0: LSISAS2008: FWVersion(11.00.00.00), ChipRevision(0x01), BiosVersion(07.23.01.00)
>> [  215.300284] mpt2sas0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
>> [  215.300602] mpt2sas0: sending port enable !!
>> [  216.854720] mpt2sas0: host_add: handle(0x0001), sas_addr(0x500605b0006b8530), phys(8)
>> [  222.978469] mpt2sas0: port enable: SUCCESS
>> [  222.980104] scsi 8:0:0:0: Direct-Access     SEAGATE  ST9146852SS      0005 PQ: 0 ANSI: 5
>> [  222.980113] scsi 8:0:0:0: SSP: handle(0x0009), sas_addr(0x5000c5001ac10319), phy(2), device_name(0x5000c5001ac10318)
>> [  222.980117] scsi 8:0:0:0: SSP: enclosure_logical_id(0x500605b0006b8530), slot(2)
>> [  222.980121] scsi 8:0:0:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1)
>> [  222.981149] mpt2sas version 12.100.00.00 unloading
>> [  222.981467] BUG: unable to handle kernel NULL pointer dereference at 0000000000000079
>> [  222.981483] IP: [<ffffffff811f0c45>] sysfs_create_dir+0x35/0xc0
>> [  222.981499] PGD 0 
>> [  222.981507] Oops: 0000 [#1] SMP 
>> [  222.981518] CPU 0 
>> [  222.981522] Modules linked in: mpt2sas(-) lockd bnep bluetooth be2iscsi iscsi_boot_sysfs bnx2i cnic uio ip6t_REJECT cxgb4i cxgb4 cxgb3i
>> [  222.981571] mpt2sas0: removing handle(0x0009), sas_addr(0x5000c5001ac10319)
>> [  222.981583]  libcxgbi cxgb3 nf_conntrack_ipv6 nf_defrag_ipv6 mdio nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip6table_filter ip6_tables snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep i7core_edac edac_core snd_seq snd_seq_device hp_wmi iTCO_wdt iTCO_vendor_support snd_pcm snd_timer snd microcode soundcore snd_page_alloc sparse_keymap serio_raw tg3 rfkill uinput sunrpc firewire_ohci firewire_core crc_itu_t scsi_transport_sas raid_class nouveau ttm drm_kms_helper drm i2c_core mxm_wmi video wmi [last unloaded: mpt2sas]
>> [  222.981790] 
>> [  222.981796] Pid: 1311, comm: scsi_scan_8 Not tainted 3.3.0 #21 Hewlett-Packard HP Z400 Workstation/0B4Ch
>> [  222.981815] RIP: 0010:[<ffffffff811f0c45>]  [<ffffffff811f0c45>] sysfs_create_dir+0x35/0xc0
>> [  222.981829] RSP: 0018:ffff8801929a7ce0  EFLAGS: 00010246
>> [  222.981837] RAX: ffff880190c65810 RBX: ffff880190d5a438 RCX: 0000000000000000
>> [  222.981846] RDX: ffff880199034c60 RSI: ffff880192a81018 RDI: ffff880190d5a438
>> [  222.981854] RBP: ffff8801929a7d10 R08: ffff880199034c60 R09: ffff880195516d80
>> [  222.981863] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
>> [  222.981872] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880190d5a400
>> [  222.981881] FS:  0000000000000000(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
>> [  222.981891] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [  222.981899] CR2: 0000000000000079 CR3: 0000000001c05000 CR4: 00000000000006f0
>> [  222.981913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  222.981922] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [  222.981932] Process scsi_scan_8 (pid: 1311, threadinfo ffff8801929a6000, task ffff88018b6e8000)
>> [  222.981941] Stack:
>> [  222.981946]  ffff8801929a7d30 ffffffff810583a5 ffff8801929a7d10 ffff880190d5a438
>> [  222.981965] mpt2sas0: sending diag reset !!
>> [  222.983072]  ffff880190c65810 0000000000000000 ffff8801929a7d60 ffffffff812c2e6e
>> [  222.983644]  ffff8801929a7d66 0044b82fa09b5a53 ffff8801929a7dd0 ffff880190d5a438
>> [  222.984215] Call Trace:
>> [  222.984776]  [<ffffffff810583a5>] ? console_unlock+0x1f5/0x270
>> [  222.985334]  [<ffffffff812c2e6e>] kobject_add_internal+0xae/0x250
>> [  222.985888]  [<ffffffff812c3417>] kobject_add+0x67/0xc0
>> [  222.986461]  [<ffffffff8139eb32>] device_add+0x102/0x6c0
>> [  222.987050]  [<ffffffff813c8288>] scsi_sysfs_add_sdev+0x198/0x340
>> [  222.987643]  [<ffffffff813c6b24>] do_scan_async+0x84/0x170
>> [  222.988241]  [<ffffffff813c6aa0>] ? do_scsi_scan_host+0xa0/0xa0
>> [  222.988845]  [<ffffffff81079c63>] kthread+0x93/0xa0
>> [  222.989443]  [<ffffffff815fd1e4>] kernel_thread_helper+0x4/0x10
>> [  222.990046]  [<ffffffff81079bd0>] ? kthread_freezable_should_stop+0x70/0x70
>> [  222.990655]  [<ffffffff815fd1e0>] ? gs_change+0x13/0x13
>> [  222.991256] Code: 83 ec 18 66 66 66 66 90 48 85 ff 48 89 fb 0f 84 98 00 00 00 48 8b 47 18 49 c7 c4 20 0e c5 81 48 85 c0 74 04 4c 8b 60 30 45 31 ed <41> 80 7c 24 79 00 75 5b 48 89 df e8 8b 2c 0d 00 48 85 c0 74 66 
>> [  222.992012] RIP  [<ffffffff811f0c45>] sysfs_create_dir+0x35/0xc0
>> [  222.992626]  RSP <ffff8801929a7ce0>
>> [  222.993265] CR2: 0000000000000079
>> [  222.993943] ---[ end trace 024b7be18310b205 ]---
>> [  224.049467] mpt2sas0: diag reset: SUCCESS
>>
>> This means that the module protection with try_module_get doesn't work, the driver removal starts again
>> in the middle of a scan.
> 
> Not necessarily; it means that the remove proceeded too far before it
> got to the sync point, which is the scan mutex acquisition in
> scsi_remove_host().  The fix is to make sure outstanding async work is
> completed before beginning the removal.  That can't really be done in
> SCSI, but it looks like this might be the trick.
> 
> James
> 
> ---
> diff --git a/drivers/base/driver.c b/drivers/base/driver.c
> index 3ec3896..246e652 100644
> --- a/drivers/base/driver.c
> +++ b/drivers/base/driver.c
> @@ -10,6 +10,7 @@
>   *
>   */
>  
> +#include <linux/async.h>
>  #include <linux/device.h>
>  #include <linux/module.h>
>  #include <linux/errno.h>
> @@ -199,6 +200,13 @@ EXPORT_SYMBOL_GPL(driver_register);
>   */
>  void driver_unregister(struct device_driver *drv)
>  {
> +	/*
> +	 * make sure all async bits that may be running have completed the way
> +	 * this works is that try_module_get() now fails, so new async work
> +	 * should take a reference to the module which now fails, so it should
> +	 * be safe to remove the driver.
> +	 */
> +	async_synchronize_full();
>  	if (!drv || !drv->p) {
>  		WARN(1, "Unexpected driver unregister!\n");
>  		return;

I'm hoping there exists another way to fix this. A patch like the above
will cause driver_unregister() to take forever if the rate at which new
devices are added is high enough. As far as I understand
async_synchronize_full() the time during which this function waits is
extended every time scanning of a newly added device starts.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html