On Tue, Jul 23, 2019 at 07:36:16AM +0800, Bob Liu wrote: > On 7/23/19 4:29 AM, Guenter Roeck wrote: > > On Mon, Jul 22, 2019 at 10:30:48AM -0700, Guenter Roeck wrote: > >> In bfq_insert_request(), bfqq is initialized with: > >> bfqq = bfq_init_rq(rq); > >> In bfq_init_rq(), we find: > >> if (unlikely(!rq->elv.icq)) > >> return NULL; > >> Indeed, rq->elv.icq can be NULL if the memory allocation in > >> create_task_io_context() failed. > >> > > > > The above function should have been ioc_create_icq(), sorry. > > > > Guenter > > > >> A comment in bfq_insert_request() suggests that bfqq is supposed to be > >> non-NULL if 'at_head || blk_rq_is_passthrough(rq)' is false. Yet, as > >> debugging and practical experience shows, this is not the case in the > >> above situation. > >> > >> This results in the following crash. > >> > >> Unable to handle kernel NULL pointer dereference > >> at virtual address 00000000000001b0 > >> ... > >> Call trace: > >> bfq_setup_cooperator+0x44/0x134 > >> bfq_insert_requests+0x10c/0x630 > >> blk_mq_sched_insert_requests+0x60/0xb4 > >> blk_mq_flush_plug_list+0x290/0x2d4 > >> blk_flush_plug_list+0xe0/0x230 > >> blk_finish_plug+0x30/0x40 > >> generic_writepages+0x60/0x94 > >> blkdev_writepages+0x24/0x30 > >> do_writepages+0x74/0xac > >> __filemap_fdatawrite_range+0x94/0xc8 > >> file_write_and_wait_range+0x44/0xa0 > >> blkdev_fsync+0x38/0x68 > >> vfs_fsync_range+0x68/0x80 > >> do_fsync+0x44/0x80 > >> > >> The problem is relatively easy to reproduce by running an image with > >> failslab enabled, such as with: > >> > >> cd /sys/kernel/debug/failslab > >> echo 10 > probability > >> echo 300 > times > >> > >> Avoid the problem by checking if bfqq is NULL before using it. With the > >> NULL check in place, requests with missing io context are queued > >> immediately, and the crash is no longer seen. > >> > > > What about other place use blk_init_rq()? > E.g in bfq_request_merged(): > 1897 struct bfq_queue *bfqq = bfq_init_rq(req); > 1898 struct bfq_data *bfqd = bfqq->bfqd; > > Which may have same Null-pointer bug? > Good question. Doug asked it as well. My understanding is that the code won't hit those places since bfq is essentially bypassed. Of course, I may be completely wrong, so we should wait for Paolo to confirm if I my understanding is correct (or wrong). Thanks, Guenter > -Bob > > >> Fixes: 18e5a57d79878 ("block, bfq: postpone rq preparation to insert or merge") > >> Reported-by: Hsin-Yi Wang <hsinyi@xxxxxxxxxx> > >> Cc: Hsin-Yi Wang <hsinyi@xxxxxxxxxx> > >> Cc: Nicolas Boichat <drinkcat@xxxxxxxxxxxx> > >> Cc: Doug Anderson <dianders@xxxxxxxxxxxx> > >> Signed-off-by: Guenter Roeck <linux@xxxxxxxxxxxx> > >> --- > >> block/bfq-iosched.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > >> index 72860325245a..56f3f4227010 100644 > >> --- a/block/bfq-iosched.c > >> +++ b/block/bfq-iosched.c > >> @@ -5417,7 +5417,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, > >> > >> spin_lock_irq(&bfqd->lock); > >> bfqq = bfq_init_rq(rq); > >> - if (at_head || blk_rq_is_passthrough(rq)) { > >> + if (!bfqq || at_head || blk_rq_is_passthrough(rq)) { > >> if (at_head) > >> list_add(&rq->queuelist, &bfqd->dispatch); > >> else > >> -- > >> 2.7.4 > >> >