When loading a device-mapper table for a request-based mapped device, and the allocation/initialization of the blk-mq tag set for the device fails, a following device remove will cause a double free. E.g. (dmesg): device-mapper: core: Cannot initialize queue for request-based dm-mq mapped device device-mapper: ioctl: unable to set up device queue for new table. Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0305e098835de000 TEID: 0305e098835de803 Fault in home space mode while using kernel ASCE. AS:000000025efe0007 R3:0000000000000024 Oops: 0038 ilc:3 [#1] SMP Modules linked in: ... lots of modules ... Supported: Yes, External CPU: 0 PID: 7348 Comm: multipathd Kdump: loaded Tainted: G W X 5.3.18-53-default #1 SLE15-SP3 Hardware name: IBM 8561 T01 7I2 (LPAR) Krnl PSW : 0704e00180000000 000000025e368eca (kfree+0x42/0x330) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 000000000000004a 000000025efe5230 c1773200d779968d 0000000000000000 000000025e520270 000000025e8d1b40 0000000000000003 00000007aae10000 000000025e5202a2 0000000000000001 c1773200d779968d 0305e098835de640 00000007a8170000 000003ff80138650 000000025e5202a2 000003e00396faa8 Krnl Code: 000000025e368eb8: c4180041e100 lgrl %r1,25eba50b8 000000025e368ebe: ecba06b93a55 risbg %r11,%r10,6,185,58 #000000025e368ec4: e3b010000008 ag %r11,0(%r1) >000000025e368eca: e310b0080004 lg %r1,8(%r11) 000000025e368ed0: a7110001 tmll %r1,1 000000025e368ed4: a7740129 brc 7,25e369126 000000025e368ed8: e320b0080004 lg %r2,8(%r11) 000000025e368ede: b904001b lgr %r1,%r11 Call Trace: [<000000025e368eca>] kfree+0x42/0x330 [<000000025e5202a2>] blk_mq_free_tag_set+0x72/0xb8 [<000003ff801316a8>] dm_mq_cleanup_mapped_device+0x38/0x50 [dm_mod] [<000003ff80120082>] free_dev+0x52/0xd0 [dm_mod] [<000003ff801233f0>] __dm_destroy+0x150/0x1d0 [dm_mod] [<000003ff8012bb9a>] dev_remove+0x162/0x1c0 [dm_mod] [<000003ff8012a988>] ctl_ioctl+0x198/0x478 [dm_mod] [<000003ff8012ac8a>] dm_ctl_ioctl+0x22/0x38 [dm_mod] [<000000025e3b11ee>] ksys_ioctl+0xbe/0xe0 [<000000025e3b127a>] __s390x_sys_ioctl+0x2a/0x40 [<000000025e8c15ac>] system_call+0xd8/0x2c8 Last Breaking-Event-Address: [<000000025e52029c>] blk_mq_free_tag_set+0x6c/0xb8 Kernel panic - not syncing: Fatal exception: panic_on_oops When allocation/initialization of the blk-mq tag set fails in `dm_mq_init_request_queue()`, it is uninitialized/freed, but the pointer is not reset to NULL; so when `dev_remove()` later gets into `dm_mq_cleanup_mapped_device()` it sees the pointer and tries to uninitialized and free it again. Fix this by also setting the pointer to NULL in `dm_mq_init_request_queue()` after error-handling. Cc: <stable@xxxxxxxxxxxxxxx> # 4.6+ Fixes: 1c357a1e86a4 ("dm: allocate blk_mq_tag_set rather than embed in mapped_device") Signed-off-by: Benjamin Block <bblock@xxxxxxxxxxxxx> --- drivers/md/dm-rq.c | 1 + 1 file changed, 1 insertion(+) Hey, I got this report from internal testing for a distribution kernel (you see the version information in the dmesg output), but I checked the code for 5.12, and it looks like nothing has change really, and the same crash would happen there as well under the same circumstances. I'm not sure why exactly the tag set allocation/initialization failed.. like you see in the dmesg output, the stack of `table_load()` already unwound and the IOCTL returned, so I can't check the error-code or the tag set state. But the crash seems obvious enough to me. I wrote a small inject to test this with 5.12, because I didn't know how to trigger this otherwise. It just jumps to `out_tag_set` in `dm_mq_init_request_queue()` after the tag set is allocated, but before `blk_mq_init_allocated_queue()` could run. Here the kernel with inject and patch applied: ... [ 52.291458] sd 0:0:0:1075265560: alua: port group 00 state A preferred supports tolusnA [ 52.291815] sd 0:0:0:1075265560: alua: port group 00 state A preferred supports tolusnA [ 158.242153] device-mapper: core: Cannot initialize queue for request-based dm mapped device [ 158.242233] device-mapper: ioctl: unable to set up device queue for new table. [ 158.761525] sd 0:0:0:1075134488: alua: port group 00 state A preferred supports tolusnA [ 158.761839] sd 0:0:0:1075134488: alua: port group 00 state A preferred supports tolusnA ... >From multipath (the userspace tool) perspective, it looks like this: Apr 29 23:13:03 | ds8k31_err_40184014_npiv: addmap [0 20971520 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 2 1 8:0 1 8:64 1] Apr 29 23:13:03 | libdevmapper: ioctl/libdm-iface.c(1923): device-mapper: reload ioctl on ds8k31_err_40184014_npiv failed: Cannot allocate memory Apr 29 23:13:03 | dm_addmap: libdm task=0 error: Success Apr 29 23:13:03 | ds8k31_err_40184014_npiv: ignoring map Apr 29 23:13:03 | ds8k31_err_40184015_npiv: addmap [0 20971520 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 2 1 8:16 1 8:80 1] create: ds8k31_err_40184015_npiv (36005076309ffd4300000000000001815) undef IBM,2107900 size=10G features='1 queue_if_no_path' hwhandler='1 alua' wp=undef `-+- policy='service-time 0' prio=50 status=undef |- 0:0:0:1075134488 sdb 8:16 undef ready running `- 1:0:1:1075134488 sdf 8:80 undef ready running ... Recreating the exact same crash as in the report without the patch (but with inject) is actually not all that easy on s390x; the double free doesn't necessarily end up touching unmapped memory, and the crashes I got where all over the place.. so its actually quite lucky I got this clear report. I don't really know the device-mapper code by heart, so thats why I marked it as RFC. - Benjamin diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 13b4385f4d5a..4583c5d6885f 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -569,6 +569,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t) blk_mq_free_tag_set(md->tag_set); out_kfree_tag_set: kfree(md->tag_set); + md->tag_set = NULL; return err; } -- 2.30.2 -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel