Re: [PATCH v2 2/2] loop: use task_work for autoclear operation

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Fri, 14 Jan 2022 20:05:31 +0900

On 2022/01/14 0:23, Jan Kara wrote:
> Well, we cannot guarantee what will be state of the loop device when you
> open it but I think we should guarantee that once you have loop device
> open, it will not be torn down under your hands. And now that I have
> realized there are those lo_state checks, I think everything is fine in
> that regard. I wanted to make sure that sequence such as:

Well, we could abort __loop_clr_fd() if somebody called "open()", something
like below. But

----------------------------------------

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index b1b05c45c07c..960db2c484ab 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1082,7 +1082,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	return error;
 }
 
-static void __loop_clr_fd(struct loop_device *lo)
+static bool __loop_clr_fd(struct loop_device *lo, int expected_refcnt)
 {
 	struct file *filp;
 	gfp_t gfp = lo->old_gfp_mask;
@@ -1104,9 +1104,19 @@ static void __loop_clr_fd(struct loop_device *lo)
 	 * Since this function is called upon "ioctl(LOOP_CLR_FD)" xor "close()
 	 * after ioctl(LOOP_CLR_FD)", it is a sign of something going wrong if
 	 * lo->lo_state has changed while waiting for lo->lo_mutex.
+	 *
+	 * However, if somebody called "open()" after lo->lo_state became
+	 * Lo_rundown, we should abort rundown in order to avoid unexpected
+	 * I/O error.
 	 */
 	mutex_lock(&lo->lo_mutex);
 	BUG_ON(lo->lo_state != Lo_rundown);
+	if (atomic_read(&lo->lo_refcnt) != expected_refcnt) {
+		lo->lo_state = Lo_bound;
+		lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
+		mutex_unlock(&lo->lo_mutex);
+		return false;
+	}
 	mutex_unlock(&lo->lo_mutex);
 
 	if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
@@ -1165,6 +1175,7 @@ static void __loop_clr_fd(struct loop_device *lo)
 		lo->lo_disk->flags |= GENHD_FL_NO_PART;
 
 	fput(filp);
+	return true;
 }
 
 static void loop_rundown_completed(struct loop_device *lo)
@@ -1181,11 +1192,12 @@ static void loop_rundown_workfn(struct work_struct *work)
 					      rundown_work);
 	struct block_device *bdev = lo->lo_device;
 	struct gendisk *disk = lo->lo_disk;
+	const bool cleared = __loop_clr_fd(lo, 0);
 
-	__loop_clr_fd(lo);
 	kobject_put(&bdev->bd_device.kobj);
 	module_put(disk->fops->owner);
-	loop_rundown_completed(lo);
+	if (cleared)
+		loop_rundown_completed(lo);
 }
 
 static void loop_schedule_rundown(struct loop_device *lo)
@@ -1228,8 +1240,8 @@ static int loop_clr_fd(struct loop_device *lo)
 	lo->lo_state = Lo_rundown;
 	mutex_unlock(&lo->lo_mutex);
 
-	__loop_clr_fd(lo);
-	loop_rundown_completed(lo);
+	if (__loop_clr_fd(lo, 1))
+		loop_rundown_completed(lo);
 	return 0;
 }
 
----------------------------------------

> Currently, we may hold both. With your async patch we hold only lo_mutex.
> Now that I better understand the nature of the deadlock, I agree that
> holding either still creates the deadlock possibility because both are
> acquired on loop device open. But now that I reminded myself the lo_state
> handling, I think the following should be safe in __loop_clr_fd:
> 
> 	/* Just a safety check... */
> 	if (WARN_ON_ONCE(data_race(lo->lo_state) != Lo_rundown))
> 		return -ENXIO;
> 

this is still racy, for somebody can reach lo_open() right after this check.

Anyway, since the problem that umount() immediately after close() (reported by
kernel test robot) remains, we need to make sure that __loop_clr_fd() completes
before close() returns to user mode.