On 05/17/2012 11:01 AM, James Bottomley wrote: > On Thu, 2012-05-17 at 08:55 +0000, Bart Van Assche wrote: >> On 05/17/12 08:42, James Bottomley wrote: >> >>> On Wed, 2012-04-18 at 18:48 +0200, Tomas Henzl wrote: >>>> On 04/12/2012 02:48 PM, Tomas Henzl wrote: >>>>> Hi, >>>>> >>>>> the first patch waited in scsi_remove_host was until the scan thread ends, some flags were set >>>>> so the thread could be aborted prematurely. >>>>> >>>>> This patch uses a try_module_get to lock the module and prevents a rmmod while it's locked. >>>>> The disadvantage is that in the protection we use again the same host template (shost->hostt->module;), >>>>> and when this were not valid I think and oops in try_module_get will follow. It seems to me >>>>> that the LLDs use scsi_scan_host only in the initial attach function so this is safe now, but .. >>>> Actually I did some testing and it is not so stable as it should be - after 'modprobe mpt2sas && rmmod --wait mpt2sas' >>>> I sometimes see: >>>> >>>> [ 215.126197] mpt2sas version 12.100.00.00 loaded >>>> [ 215.127509] scsi8 : Fusion MPT SAS Host >>>> [ 215.128058] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (6096496 kB) >>>> [ 215.128167] mpt2sas 0000:03:00.0: irq 69 for MSI/MSI-X >>>> [ 215.128190] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 69 >>>> [ 215.128192] mpt2sas0: iomem(0x00000000f7040000), mapped(0xffffc90006010000), size(16384) >>>> [ 215.128194] mpt2sas0: ioport(0x000000000000d000), size(256) >>>> [ 215.241249] mpt2sas0: Allocated physical memory: size(3988 kB) >>>> [ 215.241253] mpt2sas0: Current Controller Queue Depth(1753), Max Controller Queue Depth(2000) >>>> [ 215.241256] mpt2sas0: Scatter Gather Elements per IO(128) >>>> [ 215.300281] mpt2sas0: LSISAS2008: FWVersion(11.00.00.00), ChipRevision(0x01), BiosVersion(07.23.01.00) >>>> [ 215.300284] mpt2sas0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ) >>>> [ 215.300602] mpt2sas0: sending port enable !! >>>> [ 216.854720] mpt2sas0: host_add: handle(0x0001), sas_addr(0x500605b0006b8530), phys(8) >>>> [ 222.978469] mpt2sas0: port enable: SUCCESS >>>> [ 222.980104] scsi 8:0:0:0: Direct-Access SEAGATE ST9146852SS 0005 PQ: 0 ANSI: 5 >>>> [ 222.980113] scsi 8:0:0:0: SSP: handle(0x0009), sas_addr(0x5000c5001ac10319), phy(2), device_name(0x5000c5001ac10318) >>>> [ 222.980117] scsi 8:0:0:0: SSP: enclosure_logical_id(0x500605b0006b8530), slot(2) >>>> [ 222.980121] scsi 8:0:0:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) >>>> [ 222.981149] mpt2sas version 12.100.00.00 unloading >>>> [ 222.981467] BUG: unable to handle kernel NULL pointer dereference at 0000000000000079 >>>> [ 222.981483] IP: [<ffffffff811f0c45>] sysfs_create_dir+0x35/0xc0 >>>> [ 222.981499] PGD 0 >>>> [ 222.981507] Oops: 0000 [#1] SMP >>>> [ 222.981518] CPU 0 >>>> [ 222.981522] Modules linked in: mpt2sas(-) lockd bnep bluetooth be2iscsi iscsi_boot_sysfs bnx2i cnic uio ip6t_REJECT cxgb4i cxgb4 cxgb3i >>>> [ 222.981571] mpt2sas0: removing handle(0x0009), sas_addr(0x5000c5001ac10319) >>>> [ 222.981583] libcxgbi cxgb3 nf_conntrack_ipv6 nf_defrag_ipv6 mdio nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip6table_filter ip6_tables snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep i7core_edac edac_core snd_seq snd_seq_device hp_wmi iTCO_wdt iTCO_vendor_support snd_pcm snd_timer snd microcode soundcore snd_page_alloc sparse_keymap serio_raw tg3 rfkill uinput sunrpc firewire_ohci firewire_core crc_itu_t scsi_transport_sas raid_class nouveau ttm drm_kms_helper drm i2c_core mxm_wmi video wmi [last unloaded: mpt2sas] >>>> [ 222.981790] >>>> [ 222.981796] Pid: 1311, comm: scsi_scan_8 Not tainted 3.3.0 #21 Hewlett-Packard HP Z400 Workstation/0B4Ch >>>> [ 222.981815] RIP: 0010:[<ffffffff811f0c45>] [<ffffffff811f0c45>] sysfs_create_dir+0x35/0xc0 >>>> [ 222.981829] RSP: 0018:ffff8801929a7ce0 EFLAGS: 00010246 >>>> [ 222.981837] RAX: ffff880190c65810 RBX: ffff880190d5a438 RCX: 0000000000000000 >>>> [ 222.981846] RDX: ffff880199034c60 RSI: ffff880192a81018 RDI: ffff880190d5a438 >>>> [ 222.981854] RBP: ffff8801929a7d10 R08: ffff880199034c60 R09: ffff880195516d80 >>>> [ 222.981863] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 >>>> [ 222.981872] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880190d5a400 >>>> [ 222.981881] FS: 0000000000000000(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000 >>>> [ 222.981891] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>> [ 222.981899] CR2: 0000000000000079 CR3: 0000000001c05000 CR4: 00000000000006f0 >>>> [ 222.981913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> [ 222.981922] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>> [ 222.981932] Process scsi_scan_8 (pid: 1311, threadinfo ffff8801929a6000, task ffff88018b6e8000) >>>> [ 222.981941] Stack: >>>> [ 222.981946] ffff8801929a7d30 ffffffff810583a5 ffff8801929a7d10 ffff880190d5a438 >>>> [ 222.981965] mpt2sas0: sending diag reset !! >>>> [ 222.983072] ffff880190c65810 0000000000000000 ffff8801929a7d60 ffffffff812c2e6e >>>> [ 222.983644] ffff8801929a7d66 0044b82fa09b5a53 ffff8801929a7dd0 ffff880190d5a438 >>>> [ 222.984215] Call Trace: >>>> [ 222.984776] [<ffffffff810583a5>] ? console_unlock+0x1f5/0x270 >>>> [ 222.985334] [<ffffffff812c2e6e>] kobject_add_internal+0xae/0x250 >>>> [ 222.985888] [<ffffffff812c3417>] kobject_add+0x67/0xc0 >>>> [ 222.986461] [<ffffffff8139eb32>] device_add+0x102/0x6c0 >>>> [ 222.987050] [<ffffffff813c8288>] scsi_sysfs_add_sdev+0x198/0x340 >>>> [ 222.987643] [<ffffffff813c6b24>] do_scan_async+0x84/0x170 >>>> [ 222.988241] [<ffffffff813c6aa0>] ? do_scsi_scan_host+0xa0/0xa0 >>>> [ 222.988845] [<ffffffff81079c63>] kthread+0x93/0xa0 >>>> [ 222.989443] [<ffffffff815fd1e4>] kernel_thread_helper+0x4/0x10 >>>> [ 222.990046] [<ffffffff81079bd0>] ? kthread_freezable_should_stop+0x70/0x70 >>>> [ 222.990655] [<ffffffff815fd1e0>] ? gs_change+0x13/0x13 >>>> [ 222.991256] Code: 83 ec 18 66 66 66 66 90 48 85 ff 48 89 fb 0f 84 98 00 00 00 48 8b 47 18 49 c7 c4 20 0e c5 81 48 85 c0 74 04 4c 8b 60 30 45 31 ed <41> 80 7c 24 79 00 75 5b 48 89 df e8 8b 2c 0d 00 48 85 c0 74 66 >>>> [ 222.992012] RIP [<ffffffff811f0c45>] sysfs_create_dir+0x35/0xc0 >>>> [ 222.992626] RSP <ffff8801929a7ce0> >>>> [ 222.993265] CR2: 0000000000000079 >>>> [ 222.993943] ---[ end trace 024b7be18310b205 ]--- >>>> [ 224.049467] mpt2sas0: diag reset: SUCCESS >>>> >>>> This means that the module protection with try_module_get doesn't work, the driver removal starts again >>>> in the middle of a scan. >>> Not necessarily; it means that the remove proceeded too far before it >>> got to the sync point, which is the scan mutex acquisition in >>> scsi_remove_host(). The fix is to make sure outstanding async work is >>> completed before beginning the removal. That can't really be done in >>> SCSI, but it looks like this might be the trick. >>> >>> James >>> >>> --- >>> diff --git a/drivers/base/driver.c b/drivers/base/driver.c >>> index 3ec3896..246e652 100644 >>> --- a/drivers/base/driver.c >>> +++ b/drivers/base/driver.c >>> @@ -10,6 +10,7 @@ >>> * >>> */ >>> >>> +#include <linux/async.h> >>> #include <linux/device.h> >>> #include <linux/module.h> >>> #include <linux/errno.h> >>> @@ -199,6 +200,13 @@ EXPORT_SYMBOL_GPL(driver_register); >>> */ >>> void driver_unregister(struct device_driver *drv) >>> { >>> + /* >>> + * make sure all async bits that may be running have completed the way >>> + * this works is that try_module_get() now fails, so new async work >>> + * should take a reference to the module which now fails, so it should >>> + * be safe to remove the driver. >>> + */ >>> + async_synchronize_full(); >>> if (!drv || !drv->p) { >>> WARN(1, "Unexpected driver unregister!\n"); >>> return; Is this safe? I mean, is it possible that a new scan starts after the async_synchronize_full() ends? I wish this were fixed by fixing the real root of this problem which lies in scssi_device_get: int scsi_device_get(struct scsi_device *sdev) { if (sdev->sdev_state == SDEV_DEL) return -ENXIO; if (!get_device(&sdev->sdev_gendev)) return -ENXIO; /* We can fail this if we're doing SCSI operations * from module exit (like cache flush) */ try_module_get(sdev->host->hostt->module); here ^ by ignoring the return value. This is why rmmod --wait causes the problem and likely every other process which calls scsi_device_get would show the same bug. >> >> I'm hoping there exists another way to fix this. A patch like the above >> will cause driver_unregister() to take forever if the rate at which new >> devices are added is high enough. As far as I understand >> async_synchronize_full() the time during which this function waits is >> extended every time scanning of a newly added device starts. > module removal isn't really supposed to be a common occurrence. Rusty > even argued it's impossible to fix the races, which is why there's a > kernel option to disallow it. I don't think inserting a checkpoint in > the async domain for module removal is unreasonable. > > If it shows up as unreasonable in practise, we can label all the async > work by THIS_MODULE and implement async_synchronize_module(THIS_MODULE) > which would limit it, but I don't really think there'll be a huge > problem. > > James > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html