On Thu, Sep 16, 2021 at 09:42:29AM +0200, Daniel Wagner wrote: > On Thu, Sep 16, 2021 at 10:17:18AM +0800, Ming Lei wrote: > > Firstly, even with patches of 'qla2xxx - add nvme map_queues support', > > the knowledge if managed irq is used in nvmef LLD is still missed, so > > blk_mq_hctx_use_managed_irq() may always return false, but that > > shouldn't be hard to solve. > > Yes, that's pretty simple: > > --- a/drivers/scsi/qla2xxx/qla_os.c > +++ b/drivers/scsi/qla2xxx/qla_os.c > @@ -7914,6 +7914,9 @@ static int qla2xxx_map_queues(struct Scsi_Host *shost) > rc = blk_mq_map_queues(qmap); > else > rc = blk_mq_pci_map_queues(qmap, vha->hw->pdev, vha->irq_offset); > + > + qmap->use_managed_irq = true; > + > return rc; > } blk_mq_alloc_request_hctx() won't be called into qla2xxx queue, what we need is to mark the nvmef queue as .use_managed_irq if the LLD uses managed irq. > > > The problem is that we still should make connect io queue completed > > when all CPUs of this hctx is offline in case of managed irq. > > I agree, though if I understand this right, the scenario where all CPUs > are offline in a hctx and we want to use this htcx is only happening > after an initial setup and then reconnect attempt happens. That is > during the first connect attempt only online CPUs are assigned to the > hctx. When the CPUs are taken offline the block layer makes sure not to > use those queues anymore (no problem for the hctx so far). Then for some > reason the nmve-fc layer decides to reconnect and we end up in the > situation where we don't have any online CPU in given hctx. It is simply that blk_mq_alloc_request_hctx() allocates request from one specified hctx, and the specified hctx can be offline any time. > > > One solution might be to use io polling for connecting io queue, but nvme fc > > doesn't support polling, all the other nvme hosts do support it. > > No idea, something to explore for sure :) It is totally a raw idea, something like: start each queue in poll mode, and run connect IO queue command via polling. Once the connect io queue command is done, switch the queue into normal mode. Then blk_mq_alloc_request_hctx() is guaranteed to be successful. > > My point is that your series is fixing existing bugs and doesn't > introduce a new one. qla2xxx is already depending on managed IRQs. I > would like to see your series accepted with my hack as stop gap solution > until we have a proper fix. I am fine to work this way first if no one objects. Thanks, Ming