Re: [RFC] How to fix an async scan - 'rmmod --wait' race?

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Thu, 17 May 2012 10:01:18 +0100

On Thu, 2012-05-17 at 08:55 +0000, Bart Van Assche wrote:
> On 05/17/12 08:42, James Bottomley wrote:
> 
> > On Wed, 2012-04-18 at 18:48 +0200, Tomas Henzl wrote:
> >> On 04/12/2012 02:48 PM, Tomas Henzl wrote:
> >>> Hi,
> >>>
> >>> the first patch waited in scsi_remove_host was until the scan thread ends, some flags were set
> >>> so the thread could be aborted prematurely. 
> >>>
> >>> This patch uses a try_module_get to lock the module and prevents a rmmod while it's locked.
> >>> The disadvantage is that in the protection we use again the same host template (shost->hostt->module;),
> >>> and when this were not valid I think and oops in try_module_get will follow. It seems to me
> >>> that the LLDs use scsi_scan_host only in the initial attach function so this is safe now, but ..
> >> Actually I did some testing and it is not so stable as it should be - after 'modprobe mpt2sas && rmmod --wait mpt2sas'
> >> I sometimes see:
> >>
> >> [  215.126197] mpt2sas version 12.100.00.00 loaded                                                                                                                                         
> >> [  215.127509] scsi8 : Fusion MPT SAS Host                                                                                                                                                 
> >> [  215.128058] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (6096496 kB)                                                                                                   
> >> [  215.128167] mpt2sas 0000:03:00.0: irq 69 for MSI/MSI-X
> >> [  215.128190] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 69
> >> [  215.128192] mpt2sas0: iomem(0x00000000f7040000), mapped(0xffffc90006010000), size(16384)
> >> [  215.128194] mpt2sas0: ioport(0x000000000000d000), size(256)
> >> [  215.241249] mpt2sas0: Allocated physical memory: size(3988 kB)
> >> [  215.241253] mpt2sas0: Current Controller Queue Depth(1753), Max Controller Queue Depth(2000)
> >> [  215.241256] mpt2sas0: Scatter Gather Elements per IO(128)
> >> [  215.300281] mpt2sas0: LSISAS2008: FWVersion(11.00.00.00), ChipRevision(0x01), BiosVersion(07.23.01.00)
> >> [  215.300284] mpt2sas0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
> >> [  215.300602] mpt2sas0: sending port enable !!
> >> [  216.854720] mpt2sas0: host_add: handle(0x0001), sas_addr(0x500605b0006b8530), phys(8)
> >> [  222.978469] mpt2sas0: port enable: SUCCESS
> >> [  222.980104] scsi 8:0:0:0: Direct-Access     SEAGATE  ST9146852SS      0005 PQ: 0 ANSI: 5
> >> [  222.980113] scsi 8:0:0:0: SSP: handle(0x0009), sas_addr(0x5000c5001ac10319), phy(2), device_name(0x5000c5001ac10318)
> >> [  222.980117] scsi 8:0:0:0: SSP: enclosure_logical_id(0x500605b0006b8530), slot(2)
> >> [  222.980121] scsi 8:0:0:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1)
> >> [  222.981149] mpt2sas version 12.100.00.00 unloading
> >> [  222.981467] BUG: unable to handle kernel NULL pointer dereference at 0000000000000079
> >> [  222.981483] IP: [<ffffffff811f0c45>] sysfs_create_dir+0x35/0xc0
> >> [  222.981499] PGD 0 
> >> [  222.981507] Oops: 0000 [#1] SMP 
> >> [  222.981518] CPU 0 
> >> [  222.981522] Modules linked in: mpt2sas(-) lockd bnep bluetooth be2iscsi iscsi_boot_sysfs bnx2i cnic uio ip6t_REJECT cxgb4i cxgb4 cxgb3i
> >> [  222.981571] mpt2sas0: removing handle(0x0009), sas_addr(0x5000c5001ac10319)
> >> [  222.981583]  libcxgbi cxgb3 nf_conntrack_ipv6 nf_defrag_ipv6 mdio nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip6table_filter ip6_tables snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep i7core_edac edac_core snd_seq snd_seq_device hp_wmi iTCO_wdt iTCO_vendor_support snd_pcm snd_timer snd microcode soundcore snd_page_alloc sparse_keymap serio_raw tg3 rfkill uinput sunrpc firewire_ohci firewire_core crc_itu_t scsi_transport_sas raid_class nouveau ttm drm_kms_helper drm i2c_core mxm_wmi video wmi [last unloaded: mpt2sas]
> >> [  222.981790] 
> >> [  222.981796] Pid: 1311, comm: scsi_scan_8 Not tainted 3.3.0 #21 Hewlett-Packard HP Z400 Workstation/0B4Ch
> >> [  222.981815] RIP: 0010:[<ffffffff811f0c45>]  [<ffffffff811f0c45>] sysfs_create_dir+0x35/0xc0
> >> [  222.981829] RSP: 0018:ffff8801929a7ce0  EFLAGS: 00010246
> >> [  222.981837] RAX: ffff880190c65810 RBX: ffff880190d5a438 RCX: 0000000000000000
> >> [  222.981846] RDX: ffff880199034c60 RSI: ffff880192a81018 RDI: ffff880190d5a438
> >> [  222.981854] RBP: ffff8801929a7d10 R08: ffff880199034c60 R09: ffff880195516d80
> >> [  222.981863] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> >> [  222.981872] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880190d5a400
> >> [  222.981881] FS:  0000000000000000(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
> >> [  222.981891] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >> [  222.981899] CR2: 0000000000000079 CR3: 0000000001c05000 CR4: 00000000000006f0
> >> [  222.981913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> [  222.981922] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> [  222.981932] Process scsi_scan_8 (pid: 1311, threadinfo ffff8801929a6000, task ffff88018b6e8000)
> >> [  222.981941] Stack:
> >> [  222.981946]  ffff8801929a7d30 ffffffff810583a5 ffff8801929a7d10 ffff880190d5a438
> >> [  222.981965] mpt2sas0: sending diag reset !!
> >> [  222.983072]  ffff880190c65810 0000000000000000 ffff8801929a7d60 ffffffff812c2e6e
> >> [  222.983644]  ffff8801929a7d66 0044b82fa09b5a53 ffff8801929a7dd0 ffff880190d5a438
> >> [  222.984215] Call Trace:
> >> [  222.984776]  [<ffffffff810583a5>] ? console_unlock+0x1f5/0x270
> >> [  222.985334]  [<ffffffff812c2e6e>] kobject_add_internal+0xae/0x250
> >> [  222.985888]  [<ffffffff812c3417>] kobject_add+0x67/0xc0
> >> [  222.986461]  [<ffffffff8139eb32>] device_add+0x102/0x6c0
> >> [  222.987050]  [<ffffffff813c8288>] scsi_sysfs_add_sdev+0x198/0x340
> >> [  222.987643]  [<ffffffff813c6b24>] do_scan_async+0x84/0x170
> >> [  222.988241]  [<ffffffff813c6aa0>] ? do_scsi_scan_host+0xa0/0xa0
> >> [  222.988845]  [<ffffffff81079c63>] kthread+0x93/0xa0
> >> [  222.989443]  [<ffffffff815fd1e4>] kernel_thread_helper+0x4/0x10
> >> [  222.990046]  [<ffffffff81079bd0>] ? kthread_freezable_should_stop+0x70/0x70
> >> [  222.990655]  [<ffffffff815fd1e0>] ? gs_change+0x13/0x13
> >> [  222.991256] Code: 83 ec 18 66 66 66 66 90 48 85 ff 48 89 fb 0f 84 98 00 00 00 48 8b 47 18 49 c7 c4 20 0e c5 81 48 85 c0 74 04 4c 8b 60 30 45 31 ed <41> 80 7c 24 79 00 75 5b 48 89 df e8 8b 2c 0d 00 48 85 c0 74 66 
> >> [  222.992012] RIP  [<ffffffff811f0c45>] sysfs_create_dir+0x35/0xc0
> >> [  222.992626]  RSP <ffff8801929a7ce0>
> >> [  222.993265] CR2: 0000000000000079
> >> [  222.993943] ---[ end trace 024b7be18310b205 ]---
> >> [  224.049467] mpt2sas0: diag reset: SUCCESS
> >>
> >> This means that the module protection with try_module_get doesn't work, the driver removal starts again
> >> in the middle of a scan.
> > 
> > Not necessarily; it means that the remove proceeded too far before it
> > got to the sync point, which is the scan mutex acquisition in
> > scsi_remove_host().  The fix is to make sure outstanding async work is
> > completed before beginning the removal.  That can't really be done in
> > SCSI, but it looks like this might be the trick.
> > 
> > James
> > 
> > ---
> > diff --git a/drivers/base/driver.c b/drivers/base/driver.c
> > index 3ec3896..246e652 100644
> > --- a/drivers/base/driver.c
> > +++ b/drivers/base/driver.c
> > @@ -10,6 +10,7 @@
> >   *
> >   */
> >  
> > +#include <linux/async.h>
> >  #include <linux/device.h>
> >  #include <linux/module.h>
> >  #include <linux/errno.h>
> > @@ -199,6 +200,13 @@ EXPORT_SYMBOL_GPL(driver_register);
> >   */
> >  void driver_unregister(struct device_driver *drv)
> >  {
> > +	/*
> > +	 * make sure all async bits that may be running have completed the way
> > +	 * this works is that try_module_get() now fails, so new async work
> > +	 * should take a reference to the module which now fails, so it should
> > +	 * be safe to remove the driver.
> > +	 */
> > +	async_synchronize_full();
> >  	if (!drv || !drv->p) {
> >  		WARN(1, "Unexpected driver unregister!\n");
> >  		return;
> 
> 
> I'm hoping there exists another way to fix this. A patch like the above
> will cause driver_unregister() to take forever if the rate at which new
> devices are added is high enough. As far as I understand
> async_synchronize_full() the time during which this function waits is
> extended every time scanning of a newly added device starts.

module removal isn't really supposed to be a common occurrence.  Rusty
even argued it's impossible to fix the races, which is why there's a
kernel option to disallow it.  I don't think inserting a checkpoint in
the async domain for module removal is unreasonable.

If it shows up as unreasonable in practise, we can label all the async
work by THIS_MODULE and implement async_synchronize_module(THIS_MODULE)
which would limit it, but I don't really think there'll be a huge
problem.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html