On 09/06/2021 11:16, Ming Lei wrote:
On Wed, Jun 09, 2021 at 09:59:43AM +0100, John Garry wrote:
On 09/06/2021 07:30, Ming Lei wrote:
Thanks for the fix
tagset can't be used after blk_cleanup_queue() is returned because
freeing tagset usually follows blk_clenup_queue(). Commit d97e594c5166
("blk-mq: Use request queue-wide tags for tagset-wide sbitmap") adds
check on q->tag_set->flags in blk_mq_exit_sched(), and causes
use-after-free.
Fixes it by using hctx->flags.
The tagset is a member of the Scsi_Host structure. So it is true that this
memory may be freed before the request_queue is exited?
Yeah, please see commit c3e2219216c9 ("block: free sched's request pool in
blk_cleanup_queue")
JFYI, I could recreate with the following simple steps:
root@(none)$ mount /dev/sda1 mnt
[ 27.252887] FAT-fs (sda1): Volume was not properly unmounted. Some
data may be corrupt. Please run fsck.
_hw/unbind)$ echo HISI0162:01 > ./sys/bus/platform/drivers/hisi_sas_v2
[ 31.262274] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 31.270314] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 31.278262] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 31.286245] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 31.294164] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 31.302143] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 31.310097] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 31.321599] hisi_sas_v2_hw HISI0162:01: dev[9:1] is gone
[ 31.429245] hisi_sas_v2_hw HISI0162:01: dev[8:1] is gone
[ 31.533461] hisi_sas_v2_hw HISI0162:01: dev[7:1] is gone
[ 31.637338] hisi_sas_v2_hw HISI0162:01: dev[6:1] is gone
[ 31.740840] hisi_sas_v2_hw HISI0162:01: dev[5:1] is gone
[ 31.750659] sd 0:0:3:0: [sdd] Synchronizing SCSI cache
[ 31.833500] hisi_sas_v2_hw HISI0162:01: dev[4:1] is gone
[ 31.937351] hisi_sas_v2_hw HISI0162:01: dev[3:1] is gone
[ 31.947749] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[ 31.953195] sd 0:0:1:0: [sdb] Stopping disk
[ 32.690815] hisi_sas_v2_hw HISI0162:01: dev[2:5] is gone
[ 32.771526] hisi_sas_v2_hw HISI0162:01: dev[1:1] is gone
[ 32.790406] hisi_sas_v2_hw HISI0162:01: dev[0:2] is gone
root@(none)$
root@(none)$
root@(none)$ umount mnt
[ 37.323039]
==================================================================
[ 37.330262] BUG: KASAN: use-after-free in blk_mq_exit_sched+0x110/0x1c8
[ 37.336880] Read of size 4 at addr ffff001051e80100 by task umount/547
[ 37.343401]
[ 37.344884] CPU: 4 PID: 547 Comm: umount Not tainted
5.13.0-rc5-next-20210608 #80
[ 37.352362] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
D05 IT21 Nemo 2.0 RC0 04/18/2018
[ 37.361486] Call trace:
[ 37.363924] dump_backtrace+0x0/0x2d0
[ 37.367586] show_stack+0x18/0x28
[ 37.370898] dump_stack_lvl+0xfc/0x138
[ 37.374643] print_address_description.constprop.13+0x78/0x314
[ 37.380472] kasan_report+0x1e0/0x248
[ 37.384131] __asan_load4+0x9c/0xd8
[ 37.387615] blk_mq_exit_sched+0x110/0x1c8
[ 37.391706] __elevator_exit+0x34/0x58
[ 37.395451] blk_release_queue+0x108/0x1d8
[ 37.399545] kobject_put+0xa8/0x180
[ 37.403029] blk_put_queue+0x14/0x20
[ 37.406601] disk_release+0xcc/0x100
[ 37.410171] device_release+0x94/0x110
[ 37.413918] kobject_put+0xa8/0x180
[ 37.417401] put_device+0x14/0x28
[ 37.420712] put_disk+0x2c/0x40
[ 37.423848] blkdev_put_no_open+0x54/0x78
[ 37.427853] blkdev_put+0x108/0x258
[ 37.431335] kill_block_super+0x5c/0x78
[ 37.435166] deactivate_locked_super+0x6c/0xd0
[ 37.439605] deactivate_super+0x8c/0xa8
[ 37.443435] cleanup_mnt+0x110/0x1c0
[ 37.447007] __cleanup_mnt+0x14/0x20
[ 37.450578] task_work_run+0xbc/0x1a8
[ 37.454236] do_notify_resume+0x2cc/0x590
[ 37.458242] work_pending+0xc/0x3c8
[ 37.461725]
[ 37.463207] The buggy address belongs to the page:
[ 37.467990] page:(____ptrval____) refcount:0 mapcount:-128
mapping:0000000000000000 index:0x0 pfn:0x1051e80
[ 37.477724] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff)
[ 37.484164] raw: 0bfffc0000000000 fffffc00415a9008 ffff0017fbffebb0
0000000000000000
[ 37.491900] raw: 0000000000000000 0000000000000006 00000000ffffff7f
0000000000000000
[ 37.499635] page dumped because: kasan: bad access detected
[ 37.505198]
[ 37.506680] Memory state around the buggy address:
[ 37.511463] ffff001051e80000: ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff
[ 37.518677] ffff001051e80080: ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff
[ 37.525891] >ffff001051e80100: ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff
[ 37.533104]^
[ 37.536324] ffff001051e80180: ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff
[ 37.543538] ffff001051e80200: ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff
[ 37.550751]
==================================================================
[ 37.557963] Disabling lock debugging due to kernel taint
root@(none)$
root@(none)$
And this patch fixes it:
Tested-by: John Garry <john.garry@xxxxxxxxxx>
Reported-by: syzbot+77ba3d171a25c56756ea@xxxxxxxxxxxxxxxxxxxxxxxxx
Fixes: d97e594c5166 ("blk-mq: Use request queue-wide tags for tagset-wide sbitmap")
Cc: John Garry <john.garry@xxxxxxxxxx>
Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
---
block/blk-mq-sched.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index a9182d2f8ad3..80273245d11a 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -680,6 +680,7 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
{
struct blk_mq_hw_ctx *hctx;
unsigned int i;
+ unsigned int flags = 0;
queue_for_each_hw_ctx(q, hctx, i) {
blk_mq_debugfs_unregister_sched_hctx(hctx);
@@ -687,12 +688,13 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
e->type->ops.exit_hctx(hctx, i);
hctx->sched_data = NULL;
}
+ flags = hctx->flags;
I know the choice is limited, but it is unfortunate that we must set flags
in a loop
Does it matter?
It's just a nit on the coding style: it's not an especially good
practice to set the same value in a loop.
But, as I said, choice is limited.
}
blk_mq_debugfs_unregister_sched(q);
if (e->type->ops.exit_sched)
e->type->ops.exit_sched(e);
blk_mq_sched_tags_teardown(q);
- if (blk_mq_is_sbitmap_shared(q->tag_set->flags))
+ if (blk_mq_is_sbitmap_shared(flags))
blk_mq_exit_sched_shared_sbitmap(q);
this is
blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue)
{
sbitmap_queue_free(&queue->sched_bitmap_tags);
..
}
And isn't it safe to call sbitmap_queue_free() when
sbitmap_queue_init_node() has not been called?
I'm just wondering if we can always call blk_mq_exit_sched_shared_sbitmap()?
I know it's not an ideal choice either.
So far it may work, not sure if it can in future, I suggest to follow
the traditional alloc & free pattern.
Fine
Thanks,
John