On Mon, Feb 05, 2018 at 11:20:29PM +0800, Ming Lei wrote: > Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple > reply queues, but tags is often HBA wide. > > These HBAs have switched to use pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) > for automatic affinity assignment. > > Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs) > has been merged to V4.16-rc, and it is easy to allocate all offline CPUs > for some irq vectors, this can't be avoided even though the allocation > is improved. > > So all these drivers have to avoid to ask HBA to complete request in > reply queue which hasn't online CPUs assigned, and HPSA has been broken > with v4.15+: > > https://marc.info/?l=linux-kernel&m=151748144730409&w=2 > > This issue can be solved generically and easily via blk_mq(scsi_mq) multiple > hw queue by mapping each reply queue into hctx, but one tricky thing is > the HBA wide(instead of hw queue wide) tags. > > This patch is based on the following Hannes's patch: > > https://marc.info/?l=linux-block&m=149132580511346&w=2 > > One big difference with Hannes's is that this patch only makes the tags sbitmap > and active_queues data structure HBA wide, and others are kept as NUMA locality, > such as request, hctx, tags, ... > > The following patch will support global tags on null_blk, also the performance > data is provided, no obvious performance loss is observed when the whole > hw queue depth is same. > > Cc: Hannes Reinecke <hare@xxxxxxx> > Cc: Arun Easi <arun.easi@xxxxxxxxxx> > Cc: Omar Sandoval <osandov@xxxxxx>, > Cc: "Martin K. Petersen" <martin.petersen@xxxxxxxxxx>, > Cc: James Bottomley <james.bottomley@xxxxxxxxxxxxxxxxxxxxx>, > Cc: Christoph Hellwig <hch@xxxxxx>, > Cc: Don Brace <don.brace@xxxxxxxxxxxxx> > Cc: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> > Cc: Peter Rivera <peter.rivera@xxxxxxxxxxxx> > Cc: Mike Snitzer <snitzer@xxxxxxxxxx> > Tested-by: Laurence Oberman <loberman@xxxxxxxxxx> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > --- > block/blk-mq-debugfs.c | 1 + > block/blk-mq-sched.c | 13 ++++++++++++- > block/blk-mq-tag.c | 23 ++++++++++++++++++----- > block/blk-mq-tag.h | 5 ++++- > block/blk-mq.c | 29 ++++++++++++++++++++++++----- > block/blk-mq.h | 3 ++- > include/linux/blk-mq.h | 2 ++ > 7 files changed, 63 insertions(+), 13 deletions(-) > > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c > index 0dfafa4b655a..0f0fafe03f5d 100644 > --- a/block/blk-mq-debugfs.c > +++ b/block/blk-mq-debugfs.c > @@ -206,6 +206,7 @@ static const char *const hctx_flag_name[] = { > HCTX_FLAG_NAME(SHOULD_MERGE), > HCTX_FLAG_NAME(TAG_SHARED), > HCTX_FLAG_NAME(SG_MERGE), > + HCTX_FLAG_NAME(GLOBAL_TAGS), > HCTX_FLAG_NAME(BLOCKING), > HCTX_FLAG_NAME(NO_SCHED), > }; > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 55c0a745b427..385bbec73804 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -81,6 +81,17 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx) > } else > clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); > > + /* need to restart all hw queues for global tags */ > + if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) { > + struct blk_mq_hw_ctx *hctx2; > + int i; > + > + queue_for_each_hw_ctx(hctx->queue, hctx2, i) > + if (blk_mq_run_hw_queue(hctx2, true)) > + return true; Is it intentional that we stop after the first hw queue does work? That seems fine but it's a little confusing because the comment claims we restart everything. > + return false; > + } > +