Re: [RFC] How to fix an async scan - 'rmmod --wait' race?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2012-04-18 at 18:48 +0200, Tomas Henzl wrote:
> On 04/12/2012 02:48 PM, Tomas Henzl wrote:
> > Hi,
> >
> > the first patch waited in scsi_remove_host was until the scan thread ends, some flags were set
> > so the thread could be aborted prematurely. 
> >
> > This patch uses a try_module_get to lock the module and prevents a rmmod while it's locked.
> > The disadvantage is that in the protection we use again the same host template (shost->hostt->module;),
> > and when this were not valid I think and oops in try_module_get will follow. It seems to me
> > that the LLDs use scsi_scan_host only in the initial attach function so this is safe now, but ..
> Actually I did some testing and it is not so stable as it should be - after 'modprobe mpt2sas && rmmod --wait mpt2sas'
> I sometimes see:
> 
> [  215.126197] mpt2sas version 12.100.00.00 loaded                                                                                                                                         
> [  215.127509] scsi8 : Fusion MPT SAS Host                                                                                                                                                 
> [  215.128058] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (6096496 kB)                                                                                                   
> [  215.128167] mpt2sas 0000:03:00.0: irq 69 for MSI/MSI-X
> [  215.128190] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 69
> [  215.128192] mpt2sas0: iomem(0x00000000f7040000), mapped(0xffffc90006010000), size(16384)
> [  215.128194] mpt2sas0: ioport(0x000000000000d000), size(256)
> [  215.241249] mpt2sas0: Allocated physical memory: size(3988 kB)
> [  215.241253] mpt2sas0: Current Controller Queue Depth(1753), Max Controller Queue Depth(2000)
> [  215.241256] mpt2sas0: Scatter Gather Elements per IO(128)
> [  215.300281] mpt2sas0: LSISAS2008: FWVersion(11.00.00.00), ChipRevision(0x01), BiosVersion(07.23.01.00)
> [  215.300284] mpt2sas0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
> [  215.300602] mpt2sas0: sending port enable !!
> [  216.854720] mpt2sas0: host_add: handle(0x0001), sas_addr(0x500605b0006b8530), phys(8)
> [  222.978469] mpt2sas0: port enable: SUCCESS
> [  222.980104] scsi 8:0:0:0: Direct-Access     SEAGATE  ST9146852SS      0005 PQ: 0 ANSI: 5
> [  222.980113] scsi 8:0:0:0: SSP: handle(0x0009), sas_addr(0x5000c5001ac10319), phy(2), device_name(0x5000c5001ac10318)
> [  222.980117] scsi 8:0:0:0: SSP: enclosure_logical_id(0x500605b0006b8530), slot(2)
> [  222.980121] scsi 8:0:0:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1)
> [  222.981149] mpt2sas version 12.100.00.00 unloading
> [  222.981467] BUG: unable to handle kernel NULL pointer dereference at 0000000000000079
> [  222.981483] IP: [<ffffffff811f0c45>] sysfs_create_dir+0x35/0xc0
> [  222.981499] PGD 0 
> [  222.981507] Oops: 0000 [#1] SMP 
> [  222.981518] CPU 0 
> [  222.981522] Modules linked in: mpt2sas(-) lockd bnep bluetooth be2iscsi iscsi_boot_sysfs bnx2i cnic uio ip6t_REJECT cxgb4i cxgb4 cxgb3i
> [  222.981571] mpt2sas0: removing handle(0x0009), sas_addr(0x5000c5001ac10319)
> [  222.981583]  libcxgbi cxgb3 nf_conntrack_ipv6 nf_defrag_ipv6 mdio nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip6table_filter ip6_tables snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep i7core_edac edac_core snd_seq snd_seq_device hp_wmi iTCO_wdt iTCO_vendor_support snd_pcm snd_timer snd microcode soundcore snd_page_alloc sparse_keymap serio_raw tg3 rfkill uinput sunrpc firewire_ohci firewire_core crc_itu_t scsi_transport_sas raid_class nouveau ttm drm_kms_helper drm i2c_core mxm_wmi video wmi [last unloaded: mpt2sas]
> [  222.981790] 
> [  222.981796] Pid: 1311, comm: scsi_scan_8 Not tainted 3.3.0 #21 Hewlett-Packard HP Z400 Workstation/0B4Ch
> [  222.981815] RIP: 0010:[<ffffffff811f0c45>]  [<ffffffff811f0c45>] sysfs_create_dir+0x35/0xc0
> [  222.981829] RSP: 0018:ffff8801929a7ce0  EFLAGS: 00010246
> [  222.981837] RAX: ffff880190c65810 RBX: ffff880190d5a438 RCX: 0000000000000000
> [  222.981846] RDX: ffff880199034c60 RSI: ffff880192a81018 RDI: ffff880190d5a438
> [  222.981854] RBP: ffff8801929a7d10 R08: ffff880199034c60 R09: ffff880195516d80
> [  222.981863] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [  222.981872] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880190d5a400
> [  222.981881] FS:  0000000000000000(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
> [  222.981891] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  222.981899] CR2: 0000000000000079 CR3: 0000000001c05000 CR4: 00000000000006f0
> [  222.981913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  222.981922] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  222.981932] Process scsi_scan_8 (pid: 1311, threadinfo ffff8801929a6000, task ffff88018b6e8000)
> [  222.981941] Stack:
> [  222.981946]  ffff8801929a7d30 ffffffff810583a5 ffff8801929a7d10 ffff880190d5a438
> [  222.981965] mpt2sas0: sending diag reset !!
> [  222.983072]  ffff880190c65810 0000000000000000 ffff8801929a7d60 ffffffff812c2e6e
> [  222.983644]  ffff8801929a7d66 0044b82fa09b5a53 ffff8801929a7dd0 ffff880190d5a438
> [  222.984215] Call Trace:
> [  222.984776]  [<ffffffff810583a5>] ? console_unlock+0x1f5/0x270
> [  222.985334]  [<ffffffff812c2e6e>] kobject_add_internal+0xae/0x250
> [  222.985888]  [<ffffffff812c3417>] kobject_add+0x67/0xc0
> [  222.986461]  [<ffffffff8139eb32>] device_add+0x102/0x6c0
> [  222.987050]  [<ffffffff813c8288>] scsi_sysfs_add_sdev+0x198/0x340
> [  222.987643]  [<ffffffff813c6b24>] do_scan_async+0x84/0x170
> [  222.988241]  [<ffffffff813c6aa0>] ? do_scsi_scan_host+0xa0/0xa0
> [  222.988845]  [<ffffffff81079c63>] kthread+0x93/0xa0
> [  222.989443]  [<ffffffff815fd1e4>] kernel_thread_helper+0x4/0x10
> [  222.990046]  [<ffffffff81079bd0>] ? kthread_freezable_should_stop+0x70/0x70
> [  222.990655]  [<ffffffff815fd1e0>] ? gs_change+0x13/0x13
> [  222.991256] Code: 83 ec 18 66 66 66 66 90 48 85 ff 48 89 fb 0f 84 98 00 00 00 48 8b 47 18 49 c7 c4 20 0e c5 81 48 85 c0 74 04 4c 8b 60 30 45 31 ed <41> 80 7c 24 79 00 75 5b 48 89 df e8 8b 2c 0d 00 48 85 c0 74 66 
> [  222.992012] RIP  [<ffffffff811f0c45>] sysfs_create_dir+0x35/0xc0
> [  222.992626]  RSP <ffff8801929a7ce0>
> [  222.993265] CR2: 0000000000000079
> [  222.993943] ---[ end trace 024b7be18310b205 ]---
> [  224.049467] mpt2sas0: diag reset: SUCCESS
> 
> This means that the module protection with try_module_get doesn't work, the driver removal starts again
> in the middle of a scan.

Not necessarily; it means that the remove proceeded too far before it
got to the sync point, which is the scan mutex acquisition in
scsi_remove_host().  The fix is to make sure outstanding async work is
completed before beginning the removal.  That can't really be done in
SCSI, but it looks like this might be the trick.

James

---
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 3ec3896..246e652 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -10,6 +10,7 @@
  *
  */
 
+#include <linux/async.h>
 #include <linux/device.h>
 #include <linux/module.h>
 #include <linux/errno.h>
@@ -199,6 +200,13 @@ EXPORT_SYMBOL_GPL(driver_register);
  */
 void driver_unregister(struct device_driver *drv)
 {
+	/*
+	 * make sure all async bits that may be running have completed the way
+	 * this works is that try_module_get() now fails, so new async work
+	 * should take a reference to the module which now fails, so it should
+	 * be safe to remove the driver.
+	 */
+	async_synchronize_full();
 	if (!drv || !drv->p) {
 		WARN(1, "Unexpected driver unregister!\n");
 		return;


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux