On Thu, Oct 24, 2019 at 4:53 PM James Smart <jsmart2021@xxxxxxxxx> wrote: > > During the following test scenario: > - Offline a cpu > - load lpfc driver, which auto-discovers NVMe devices. For a new > nvme device, the lpfc/nvme_fc transport can request up to > num_online_cpus() worth of nr_hw_queues. The target in > this case allowed at least that many of nvme queues. > The system encountered the following crash: > > BUG: unable to handle kernel paging request at 00003659d33953a8 > ... > Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc] > RIP: 0010:blk_mq_get_request+0x21d/0x3c0 > ... > Blk_mq_alloc_request_hctx+0xef/0x140 > Nvme_alloc_request+0x32/0x80 [nvme_core] > __nvme_submit_sync_cmd+0x4a/0x1c0 [nvme_core] > Nvmf_connect_io_queue+0x130/0x1a0 [nvme_fabrics] > Nvme_fc_connect_io_queues+0x285/0x2b0 [nvme_fc] > Nvme_fc_create_association+0x0x8ea/0x9c0 [nvme_fc] > Nvme_fc_connect_ctrl_work+0x19/0x50 [nvme_fc] > ... > > There was a commit a while ago to simplify queue mapping which > replaced the use of cpumask_first() by cpumask_first_and(). > The issue is if cpumask_first_and() does not find any _intersecting_ cpus, > it return's nr_cpu_id. nr_cpu_id isn't valid for the per_cpu_ptr index > which is done in __blk_mq_get_ctx(). > > Considered reverting back to cpumask_first(), but instead followed > logic in blk_mq_first_mapped_cpu() to check for nr_cpu_id before > calling cpumask_first(). > > Fixes: 20e4d8139319 ("blk-mq: simplify queue mapping & schedule with each possisble CPU") > Signed-off-by: Shagun Agrawal <shagun.agrawal@xxxxxxxxxxxx> > Signed-off-by: James Smart <jsmart2021@xxxxxxxxx> > CC: Christoph Hellwig <hch@xxxxxx> > --- > block/blk-mq.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 8538dc415499..0b06b4ea57f1 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -461,6 +461,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > return ERR_PTR(-EXDEV); > } > cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask); > + if (cpu >= nr_cpu_ids) > + cpu = cpumask_first(alloc_data.hctx->cpumask); The first cpu may be offline too, then kernel warning or timeout may be triggered later when this allocated request is dispatched. To be honest, blk_mq_alloc_request_hctx() is really a weird interface, given the hctx may become dead just when calling into blk_mq_alloc_request_hctx(). Given only NVMe FC/RDMA uses this interface, could you provide some background about this kind of usage? The normal usage is that user doesn't specify the hctx for allocating request from, since blk-mq can figure out which hctx is used for allocation via queue mapping. Just wondering why NVMe FC/RDMA can't do that way? Thanks, Ming Lei