On 3/26/20 7:52 PM, Bart Van Assche wrote: > On 2020-03-26 17:19, Dongli Zhang wrote: >> I think the issue is because of line 2827, that is, the q->nr_hw_queues is >> updated too earlier. It is still possible the init would fail later. >> >> 2809 static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, >> 2810 struct request_queue *q) >> 2811 { >> 2812 int i, j, end; >> 2813 struct blk_mq_hw_ctx **hctxs = q->queue_hw_ctx; >> 2814 >> 2815 if (q->nr_hw_queues < set->nr_hw_queues) { >> 2816 struct blk_mq_hw_ctx **new_hctxs; >> 2817 >> 2818 new_hctxs = kcalloc_node(set->nr_hw_queues, >> 2819 sizeof(*new_hctxs), GFP_KERNEL, >> 2820 set->numa_node); >> 2821 if (!new_hctxs) >> 2822 return; >> 2823 if (hctxs) >> 2824 memcpy(new_hctxs, hctxs, q->nr_hw_queues * >> 2825 sizeof(*hctxs)); >> 2826 q->queue_hw_ctx = new_hctxs; >> 2827 q->nr_hw_queues = set->nr_hw_queues; >> 2828 kfree(hctxs); >> 2829 hctxs = new_hctxs; >> 2830 } > > Which kernel tree does this syzbot report refer to? Commit > d0930bb8f46b ("blk-mq: Fix a recently introduced regression in > blk_mq_realloc_hw_ctxs()") in Jens' tree removed line 2827 shown above. > Thank you very much for sharing this. The below is in Jens' tree for 5.7. commit d0930bb8f46b8fb4a7d429c0bf1c91b3ed00a7cf Author: Bart Van Assche <bvanassche@xxxxxxx> Date: Mon Mar 9 21:26:18 2020 -0700 blk-mq: Fix a recently introduced regression in blk_mq_realloc_hw_ctxs() q->nr_hw_queues must only be updated once it is known that blk_mq_realloc_hw_ctxs() has succeeded. Otherwise it can happen that reallocation fails and that q->nr_hw_queues is larger than the number of allocated hardware queues. This patch fixes the following crash if increasing the number of hardware queues fails: BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x775/0x810 Write of size 8 at addr 0000000000000118 by task check/977 CPU: 3 PID: 977 Comm: check Not tainted 5.6.0-rc1-dbg+ #8 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack+0xa5/0xe6 __kasan_report.cold+0x65/0x99 kasan_report+0x16/0x20 check_memory_region+0x140/0x1b0 memset+0x28/0x40 blk_mq_map_swqueue+0x775/0x810 blk_mq_update_nr_hw_queues+0x468/0x710 nullb_device_submit_queues_store+0xf7/0x1a0 [null_blk] configfs_write_file+0x1c4/0x250 [configfs] __vfs_write+0x4c/0x90 vfs_write+0x145/0x2c0 ksys_write+0xd7/0x180 __x64_sys_write+0x47/0x50 do_syscall_64+0x6f/0x2f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: ac0d6b926e74 ("block: Reduce the amount of memory required per request queue") Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx> Cc: Keith Busch <kbusch@xxxxxxxxxx> Cc: Johannes Thumshirn <jth@xxxxxxxxxx> Cc: Hannes Reinecke <hare@xxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> diff --git a/block/blk-mq.c b/block/blk-mq.c index d4bd9b961726..37ff8dfb8ab9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2824,7 +2824,6 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, memcpy(new_hctxs, hctxs, q->nr_hw_queues * sizeof(*hctxs)); q->queue_hw_ctx = new_hctxs; - q->nr_hw_queues = set->nr_hw_queues; kfree(hctxs); hctxs = new_hctxs; } That should be the reason why "init_hctx() fault injection" was introduced. Thank you very much! Dongli Zhang