Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

Ming Lei <ming.lei@xxxxxxxxxx> · Sun, 8 Apr 2018 18:44:34 +0800

On Sun, Apr 08, 2018 at 01:36:27PM +0300, Sagi Grimberg wrote:
> 
> > Hi Sagi
> > 
> > Still can reproduce this issue with the change:
> 
> Thanks for validating Yi,
> 
> Would it be possible to test the following:
> --
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 75336848f7a7..81ced3096433 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -444,6 +444,10 @@ struct request *blk_mq_alloc_request_hctx(struct
> request_queue *q,
>                 return ERR_PTR(-EXDEV);
>         }
>         cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
> +       if (cpu >= nr_cpu_ids) {
> +               pr_warn("no online cpu for hctx %d\n", hctx_idx);
> +               cpu = cpumask_first(alloc_data.hctx->cpumask);
> +       }
>         alloc_data.ctx = __blk_mq_get_ctx(q, cpu);
> 
>         rq = blk_mq_get_request(q, NULL, op, &alloc_data);
> --
> ...
> 
> 
> > [  153.384977] BUG: unable to handle kernel paging request at
> > 00003a9ed053bd48
> > [  153.393197] IP: blk_mq_get_request+0x23e/0x390
> 
> Also would it be possible to provide gdb output of:
> 
> l *(blk_mq_get_request+0x23e)

nvmf_connect_io_queue() is used in this way by asking blk-mq to allocate
request from one specific hw queue, but there may not be all online CPUs
mapped to this hw queue.

Thanks,
Ming