On Mon, Apr 27, 2015 at 12:12 AM, NeilBrown <neilb@xxxxxxx> wrote: > > Because of the peculiar way that md devices are created (automatically > when the device node is opened), a new device can be created and > registered immediately after the > blk_unregister_region(disk_devt(disk), disk->minors); > call in del_gendisk(). > > Therefore it is important that all visible artifacts of the previous > device are removed before this call. In particular, the 'bdi'. > > Since: > commit c4db59d31e39ea067c32163ac961e9c80198fd37 > Author: Christoph Hellwig <hch@xxxxxx> > fs: don't reassign dirty inodes to default_backing_dev_info > > moved the > device_unregister(bdi->dev); > call from bdi_unregister() to bdi_destroy() it has been quite easy to > lose a race and have a new (e.g.) "md127" be created after the > blk_unregister_region() call and before bdi_destroy() is ultimately > called by the final 'put_disk', which must come after del_gendisk(). > > The new device finds that the bdi name is already registered in sysfs > and complains > >> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70() >> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127' > > We can fix this by moving the bdi_destroy() call out of > blk_release_queue() (which can happen very late when a refcount > reaches zero) and into blk_cleanup_queue() - which happens exactly when the md > device driver calls it. > > Then it is only necessary for md to call blk_cleanup_queue() before > del_gendisk(). As loop.c devices are also created on demand by > opening the device node, we make the same change there. > > Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37 > Reported-by: Azat Khuzhin <a3at.mail@xxxxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: stable@xxxxxxxxxxxxxxx (v4.0) > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > -- > Hi Jens, > if you could check this and forward on to Linus I'd really appreciate it. > > Thanks, > NeilBrown > > > diff --git a/block/blk-core.c b/block/blk-core.c > index fd154b94447a..7871603f0a29 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -552,6 +552,8 @@ void blk_cleanup_queue(struct request_queue *q) > q->queue_lock = &q->__queue_lock; > spin_unlock_irq(lock); > > + bdi_destroy(&q->backing_dev_info); > + > /* @q is and will stay empty, shutdown and put */ > blk_put_queue(q); > } > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index faaf36ade7eb..2b8fd302f677 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -522,8 +522,6 @@ static void blk_release_queue(struct kobject *kobj) > > blk_trace_shutdown(q); > > - bdi_destroy(&q->backing_dev_info); > - > ida_simple_remove(&blk_queue_ida, q->id); > call_rcu(&q->rcu_head, blk_free_queue_rcu); > } > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index ae3fcb4199e9..d7173cb1ea76 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1620,8 +1620,8 @@ out: > > static void loop_remove(struct loop_device *lo) > { > - del_gendisk(lo->lo_disk); > blk_cleanup_queue(lo->lo_queue); > + del_gendisk(lo->lo_disk); > blk_mq_free_tag_set(&lo->tag_set); > put_disk(lo->lo_disk); > kfree(lo); > diff --git a/drivers/md/md.c b/drivers/md/md.c > index d4f31e195e26..593a02476c78 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -4818,12 +4818,12 @@ static void md_free(struct kobject *ko) > if (mddev->sysfs_state) > sysfs_put(mddev->sysfs_state); > > + if (mddev->queue) > + blk_cleanup_queue(mddev->queue); > if (mddev->gendisk) { > del_gendisk(mddev->gendisk); > put_disk(mddev->gendisk); > } > - if (mddev->queue) > - blk_cleanup_queue(mddev->queue); > > kfree(mddev); > } I've taken this patch into consideration relative to DM, please see: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip&id=5dac86fb79697e93297edb6a316b429236d00137 Point is in my private snitzer/wip branch DM now calls blk_cleanup_queue() before del_gendisk(). With your patch: 1) blk_cleanup_queue -> bdi_destroy -> bdi->dev = NULL; 2) del_gendisk -> bdi_unregister -> WARN_ON_ONCE(!bdi->dev) So, in testing with DM I'm hitting bdi_unregister's WARN_ON(), are you seeing this WARN_ON? [ 385.134474] WARNING: CPU: 3 PID: 11231 at mm/backing-dev.c:372 bdi_unregister+0x36/0x40() [ 385.143593] Modules linked in: dm_service_time dm_multipath target_core_iblock tcm_loop target_core_mod sg iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi core temp kvm_intel kvm ixgbe igb crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mdio aesni_intel ptp pps_core glue_helper ipmi_si i7core_edac iTCO_wdt lrw dca iTCO_vendor_support edac_core i2c_i801 pcspkr gf128mul ipmi_msghandler acpi_power_meter shpchp lpc_ich ablk_helper cryptd mfd_core acpi_cpufreq xfs libcrc32c sr_mo d cdrom mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ata_generic ttm pata_acpi sd_mod ata_piix drm iomemory_vsl(POE) libata megaraid_sas i2c_c ore skd dm_mirror dm_region_hash dm_log dm_mod [ 385.213736] CPU: 3 PID: 11231 Comm: dmsetup Tainted: P IOE 4.1.0-rc1.snitm+ #55 [ 385.222958] Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011 [ 385.237602] 0000000000000000 000000006cf7a5f2 ffff88031bfcfb68 ffffffff8167b7e6 [ 385.245897] 0000000000000000 0000000000000000 ffff88031bfcfba8 ffffffff8107bd5a [ 385.254193] 0000000000000000 ffff88032c4b6c00 0000000000000000 0000000000000000 [ 385.262488] Call Trace: [ 385.265218] [<ffffffff8167b7e6>] dump_stack+0x45/0x57 [ 385.270955] [<ffffffff8107bd5a>] warn_slowpath_common+0x8a/0xc0 [ 385.277660] [<ffffffff8107be8a>] warn_slowpath_null+0x1a/0x20 [ 385.284171] [<ffffffff8119e216>] bdi_unregister+0x36/0x40 [ 385.290295] [<ffffffff812fb8c1>] del_gendisk+0x131/0x2b0 [ 385.296326] [<ffffffffa0000eba>] cleanup_mapped_device+0xda/0x130 [dm_mod] [ 385.304101] [<ffffffffa000283b>] __dm_destroy+0x19b/0x260 [dm_mod] [ 385.311099] [<ffffffffa00040c3>] dm_destroy+0x13/0x20 [dm_mod] [ 385.317709] [<ffffffffa0009d0e>] dev_remove+0x11e/0x180 [dm_mod] [ 385.324516] [<ffffffffa0009bf0>] ? dev_suspend+0x250/0x250 [dm_mod] [ 385.331614] [<ffffffffa000a3e5>] ctl_ioctl+0x255/0x500 [dm_mod] [ 385.338319] [<ffffffff8128c8d8>] ? SYSC_semtimedop+0x298/0xea0 [ 385.344931] [<ffffffffa000a6a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 385.351733] [<ffffffff8120d038>] do_vfs_ioctl+0x2f8/0x4f0 [ 385.357857] [<ffffffff811207e4>] ? __audit_syscall_entry+0xb4/0x110 [ 385.364950] [<ffffffff8102367c>] ? do_audit_syscall_entry+0x6c/0x70 [ 385.372041] [<ffffffff8120d2b1>] SyS_ioctl+0x81/0xa0 [ 385.377679] [<ffffffff81025046>] ? syscall_trace_leave+0xc6/0x120 [ 385.384578] [<ffffffff81682f2e>] system_call_fastpath+0x12/0x71 [ 385.391282] ---[ end trace af60d8ac7157d319 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html