On 2022/01/14 0:23, Jan Kara wrote: > Well, we cannot guarantee what will be state of the loop device when you > open it but I think we should guarantee that once you have loop device > open, it will not be torn down under your hands. And now that I have > realized there are those lo_state checks, I think everything is fine in > that regard. I wanted to make sure that sequence such as: Well, we could abort __loop_clr_fd() if somebody called "open()", something like below. But ---------------------------------------- diff --git a/drivers/block/loop.c b/drivers/block/loop.c index b1b05c45c07c..960db2c484ab 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1082,7 +1082,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode, return error; } -static void __loop_clr_fd(struct loop_device *lo) +static bool __loop_clr_fd(struct loop_device *lo, int expected_refcnt) { struct file *filp; gfp_t gfp = lo->old_gfp_mask; @@ -1104,9 +1104,19 @@ static void __loop_clr_fd(struct loop_device *lo) * Since this function is called upon "ioctl(LOOP_CLR_FD)" xor "close() * after ioctl(LOOP_CLR_FD)", it is a sign of something going wrong if * lo->lo_state has changed while waiting for lo->lo_mutex. + * + * However, if somebody called "open()" after lo->lo_state became + * Lo_rundown, we should abort rundown in order to avoid unexpected + * I/O error. */ mutex_lock(&lo->lo_mutex); BUG_ON(lo->lo_state != Lo_rundown); + if (atomic_read(&lo->lo_refcnt) != expected_refcnt) { + lo->lo_state = Lo_bound; + lo->lo_flags |= LO_FLAGS_AUTOCLEAR; + mutex_unlock(&lo->lo_mutex); + return false; + } mutex_unlock(&lo->lo_mutex); if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags)) @@ -1165,6 +1175,7 @@ static void __loop_clr_fd(struct loop_device *lo) lo->lo_disk->flags |= GENHD_FL_NO_PART; fput(filp); + return true; } static void loop_rundown_completed(struct loop_device *lo) @@ -1181,11 +1192,12 @@ static void loop_rundown_workfn(struct work_struct *work) rundown_work); struct block_device *bdev = lo->lo_device; struct gendisk *disk = lo->lo_disk; + const bool cleared = __loop_clr_fd(lo, 0); - __loop_clr_fd(lo); kobject_put(&bdev->bd_device.kobj); module_put(disk->fops->owner); - loop_rundown_completed(lo); + if (cleared) + loop_rundown_completed(lo); } static void loop_schedule_rundown(struct loop_device *lo) @@ -1228,8 +1240,8 @@ static int loop_clr_fd(struct loop_device *lo) lo->lo_state = Lo_rundown; mutex_unlock(&lo->lo_mutex); - __loop_clr_fd(lo); - loop_rundown_completed(lo); + if (__loop_clr_fd(lo, 1)) + loop_rundown_completed(lo); return 0; } ---------------------------------------- > Currently, we may hold both. With your async patch we hold only lo_mutex. > Now that I better understand the nature of the deadlock, I agree that > holding either still creates the deadlock possibility because both are > acquired on loop device open. But now that I reminded myself the lo_state > handling, I think the following should be safe in __loop_clr_fd: > > /* Just a safety check... */ > if (WARN_ON_ONCE(data_race(lo->lo_state) != Lo_rundown)) > return -ENXIO; > this is still racy, for somebody can reach lo_open() right after this check. Anyway, since the problem that umount() immediately after close() (reported by kernel test robot) remains, we need to make sure that __loop_clr_fd() completes before close() returns to user mode.