On 05/06/2021 15:09, Christoph Hellwig wrote: > Commit c76f48eb5c08 ("block: take bd_mutex around delete_partitions in > del_gendisk") adds disk->part0->bd_mutex in del_gendisk(), this way > causes the following AB/BA deadlock between removing loop and opening > loop: > > 1) loop_control_ioctl(LOOP_CTL_REMOVE) > -> mutex_lock(&loop_ctl_mutex) > -> del_gendisk > -> mutex_lock(&disk->part0->bd_mutex) > > 2) blkdev_get_by_dev > -> mutex_lock(&disk->part0->bd_mutex) > -> lo_open > -> mutex_lock(&loop_ctl_mutex) > > Add a new Lo_deleting state to remove the need for clearing > ->private_data and thus holding loop_ctl_mutex in the ioctl > LOOP_CTL_REMOVE path. > > Based on an analysis and earlier patch from > Ming Lei <ming.lei@xxxxxxxxxx>. > > Reported-by: Colin Ian King <colin.king@xxxxxxxxxxxxx> > Fixes: c76f48eb5c08 ("block: take bd_mutex around delete_partitions in del_gendisk") > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > --- > drivers/block/loop.c | 25 +++++++------------------ > drivers/block/loop.h | 1 + > 2 files changed, 8 insertions(+), 18 deletions(-) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index d58d68f3c7cd..76e12f3482a9 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1879,29 +1879,18 @@ static int lo_compat_ioctl(struct block_device *bdev, fmode_t mode, > > static int lo_open(struct block_device *bdev, fmode_t mode) > { > - struct loop_device *lo; > + struct loop_device *lo = bdev->bd_disk->private_data; > int err; > > - /* > - * take loop_ctl_mutex to protect lo pointer from race with > - * loop_control_ioctl(LOOP_CTL_REMOVE), however, to reduce contention > - * release it prior to updating lo->lo_refcnt. > - */ > - err = mutex_lock_killable(&loop_ctl_mutex); > - if (err) > - return err; > - lo = bdev->bd_disk->private_data; > - if (!lo) { > - mutex_unlock(&loop_ctl_mutex); > - return -ENXIO; > - } > err = mutex_lock_killable(&lo->lo_mutex); > - mutex_unlock(&loop_ctl_mutex); > if (err) > return err; > - atomic_inc(&lo->lo_refcnt); > + if (lo->lo_state == Lo_deleting) > + err = -ENXIO; > + else > + atomic_inc(&lo->lo_refcnt); > mutex_unlock(&lo->lo_mutex); > - return 0; > + return err; > } > > static void lo_release(struct gendisk *disk, fmode_t mode) > @@ -2285,7 +2274,7 @@ static long loop_control_ioctl(struct file *file, unsigned int cmd, > mutex_unlock(&lo->lo_mutex); > break; > } > - lo->lo_disk->private_data = NULL; > + lo->lo_state = Lo_deleting; > mutex_unlock(&lo->lo_mutex); > idr_remove(&loop_index_idr, lo->lo_number); > loop_remove(lo); > diff --git a/drivers/block/loop.h b/drivers/block/loop.h > index a3c04f310672..5beb959b94d3 100644 > --- a/drivers/block/loop.h > +++ b/drivers/block/loop.h > @@ -22,6 +22,7 @@ enum { > Lo_unbound, > Lo_bound, > Lo_rundown, > + Lo_deleting, > }; > > struct loop_func_table; > Tested and no longer hangs with the stress-ng loop test. Thank you for the speedy turnaround on the fix. Tested-by: Colin Ian King <colin.king@xxxxxxxxxxxxx> Colin