Hello, This is great. So much simpler & better. Some nits below. > diff --git a/block/partitions/core.c b/block/partitions/core.c > index a02e224115943d..0ba0bf44b88af3 100644 > --- a/block/partitions/core.c > +++ b/block/partitions/core.c > @@ -340,12 +340,11 @@ void delete_partition(struct hd_struct *part) > device_del(part_to_dev(part)); > > /* > - * Remove gendisk pointer from idr so that it cannot be looked up > - * while RCU period before freeing gendisk is running to prevent > - * use-after-free issues. Note that the device number stays > - * "in-use" until we really free the gendisk. > + * Remove the block device from the inode hash, so that it cannot be > + * looked up while waiting for the RCU grace period. > */ > - blk_invalidate_devt(part_devt(part)); > + remove_inode_hash(part->bdev->bd_inode); I don't think this is necessary now that the bdev and inode lifetimes are one. Before, punching out the association early was necessary because we could be in a situation where we can successfully look up a part from idr and then try to pin the associated disk which may already be freed. With the new code, the lookup is through the inode whose lifetime is one and the same with gendisk, so use-after-free isn't possible and __blkdev_get() will reliably reject such open attempts. ... > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 4c4d6c30382c06..e94633dc6ad93b 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -870,34 +870,50 @@ void __init bdev_cache_init(void) > blockdev_superblock = bd_mnt->mnt_sb; /* For writeback */ > } > > -static struct block_device *bdget(dev_t dev) > +struct block_device *bdev_alloc(struct gendisk *disk, u8 partno) > { > struct block_device *bdev; > struct inode *inode; > > - inode = iget_locked(blockdev_superblock, dev); > + inode = new_inode(blockdev_superblock); > if (!inode) > return NULL; > > - bdev = &BDEV_I(inode)->bdev; > + bdev = I_BDEV(inode); > + spin_lock_init(&bdev->bd_size_lock); > + bdev->bd_disk = disk; > + bdev->bd_partno = partno; > + bdev->bd_contains = NULL; > + bdev->bd_super = NULL; > + bdev->bd_inode = inode; > + bdev->bd_part_count = 0; > + > + inode->i_mode = S_IFBLK; > + inode->i_rdev = 0; > + inode->i_bdev = bdev; > + inode->i_data.a_ops = &def_blk_aops; Missing the call to mapping_set_gfp_mask(). > > - if (inode->i_state & I_NEW) { > - spin_lock_init(&bdev->bd_size_lock); > - bdev->bd_contains = NULL; > - bdev->bd_super = NULL; > - bdev->bd_inode = inode; > - bdev->bd_part_count = 0; > - bdev->bd_dev = dev; > - inode->i_mode = S_IFBLK; > - inode->i_rdev = dev; > - inode->i_bdev = bdev; > - inode->i_data.a_ops = &def_blk_aops; > - mapping_set_gfp_mask(&inode->i_data, GFP_USER); > - unlock_new_inode(inode); > - } > return bdev; > } ... > /** > * bdgrab -- Grab a reference to an already referenced block device > * @bdev: Block device to grab a reference to. > @@ -957,6 +973,10 @@ static struct block_device *bd_acquire(struct inode *inode) > bd_forget(inode); > > bdev = bdget(inode->i_rdev); > + if (!bdev) { > + blk_request_module(inode->i_rdev); > + bdev = bdget(inode->i_rdev); > + } One side effect here is that, before, a device which uses the traditional consecutive devt range would reserve all minors for the partitions whether they exist or not and fail open requests without requesting the matching module. After the change, trying to open an non-existent partition would trigger module probe. I don't think the behavior change is consequential in any sane not-crazily-arcane setup but it might be worth mentioning in the commit log. Thank you. -- tejun -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel