Re: Possible deadlock detected in Linux 6.2.0 in dm_get_inactive_table (dm-ioctl.c)

Mike Snitzer <snitzer@xxxxxxxxxx> · Mon, 17 Apr 2023 12:20:25 -0400

On Mon, Apr 17 2023 at  1:08P -0400,
Zheng Zhang <zheng.zhang@xxxxxxxxxxxxx> wrote:

> Alasdir, Mike, and to whom it may concern:
> 
> Hello! We have found a bug in the Linux kernel version 6.2.0 by syzkaller
> with our own templates. The bug causes a possible recursive locking
> scenario, resulting in a deadlock.
> The key trace is as follows (the complete trace is in the attached report
> file):
> 
>  down_read+0x9d/0x450 kernel/locking/rwsem.c:1509
> 
>  dm_get_inactive_table+0x2b/0xc0 drivers/md/dm-ioctl.c:773
> 
>  __dev_status+0x4fd/0x7c0 drivers/md/dm-ioctl.c:844
>  table_clear+0x197/0x280 drivers/md/dm-ioctl.c:1537
> 
> In table_clear, it acquires a *write lock*
> https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L1520
> down_write(&_hash_lock);
> 
> Then before the lock is released at L1539, there is a path shown above:
> table_clear -> __dev_status -> dm_get_inactive_table ->  down_read
> https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L773
> down_read(&_hash_lock);
> It tries to acquire* the same read lock* again, resulting in the deadlock
> problem
> 
> Attached is the report, log, and reproducers generated by syzkaller
> Please let me know if there is any additional information that I can
> provide to help debug this issue.
> Thanks!

Thanks for the report, I've staged this fix:

From: Mike Snitzer <snitzer@xxxxxxxxxx>
Subject: [PATCH] dm ioctl: fix nested locking in table_clear() to remove
 deadlock concern

syzkaller found the following problematic rwsem locking (with write
lock already held):

 down_read+0x9d/0x450 kernel/locking/rwsem.c:1509
 dm_get_inactive_table+0x2b/0xc0 drivers/md/dm-ioctl.c:773
 __dev_status+0x4fd/0x7c0 drivers/md/dm-ioctl.c:844
 table_clear+0x197/0x280 drivers/md/dm-ioctl.c:1537

In table_clear, it first acquires a write lock
https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L1520
down_write(&_hash_lock);

Then before the lock is released at L1539, there is a path shown above:
table_clear -> __dev_status -> dm_get_inactive_table ->  down_read
https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L773
down_read(&_hash_lock);

It tries to acquire the same read lock again, resulting in the deadlock
problem.

Fix this by moving table_clear()'s __dev_status() call to after its
up_write(&_hash_lock);

Cc: stable@xxxxxxxxxxxxxxx
Reported-by: Zheng Zhang <zheng.zhang@xxxxxxxxxxxxx>
Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
---
 drivers/md/dm-ioctl.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 50a1259294d1..7d5c9c582ed2 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1556,11 +1556,12 @@ static int table_clear(struct file *filp, struct dm_ioctl *param, size_t param_s
 		has_new_map = true;
 	}
 
-	param->flags &= ~DM_INACTIVE_PRESENT_FLAG;
-
-	__dev_status(hc->md, param);
 	md = hc->md;
 	up_write(&_hash_lock);
+
+	param->flags &= ~DM_INACTIVE_PRESENT_FLAG;
+	__dev_status(md, param);
+
 	if (old_map) {
 		dm_sync_table(md);
 		dm_table_destroy(old_map);
-- 
2.40.0

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel