Re: [PATCH V2 1/9] blk-mq: add blk_mq_max_nr_hw_queues()

Ming Lei <ming.lei@xxxxxxxxxx> · Thu, 27 Jul 2023 09:06:50 +0800

On Wed, Jul 26, 2023 at 05:36:34PM +0100, John Garry wrote:
> On 26/07/2023 10:40, Ming Lei wrote:
> 
> Hi Ming,
> 
> > blk_mq_alloc_tag_set() may override set->nr_hw_queues as 1 in case of kdump
> > kernel. This way causes trouble for driver, because blk-mq and driver see
> > different queue mapping. Especially the only online CPU may not be 1 for
> > kdump kernel, in which 'maxcpus=1' is passed from kernel command line,
> 
> "the only online CPU may not be 1 for kdump kernel, in which 'maxcpus=1'
> ..." - this seems inconsistent with the cover letter, where we have
> "'maxcpus=1' just bring up one single cpu core during booting."

OK, looks I should have mentioned "the only online CPU may not be 1 for
maxcpus=1" in cover letter.

> > then driver may map hctx0 into one inactive real hw queue which cpu
> > affinity is 0(offline).
> > 
> > The issue exists on all drivers which use managed irq and support
> > multiple hw queue.
> > 
> > Prepare for fixing this kind of issue by applying the added helper, so
> > driver can take blk-mq max nr_hw_queues knowledge into account when
> > calculating io queues.
> 
> Could you alternatively solve in blk_mq_pci_map_queues() by fixing the
> mappings in case of kdump to have all per-CPU mappings point at the HW queue
> associated with cpu0 (which I assume would be active)? It ain't pretty ...

cpu0 mapping isn't correct, as I mentioned that 'maxcpus=1' doesn't
mean the only online cpu is cpu0, such as, on ppc64, the only online
cpu could be selected at random during booting.

> 
> > 
> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > ---
> >   block/blk-mq.c         | 16 ++++++++++++++++
> >   include/linux/blk-mq.h |  1 +
> >   2 files changed, 17 insertions(+)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index b04ff6f56926..617d6f849a7b 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -140,6 +140,22 @@ void blk_mq_freeze_queue_wait(struct request_queue *q)
> >   }
> >   EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
> > +/*
> > + * Return the max supported nr_hw_queues for each hw queue type
> > + *
> > + * blk_mq_alloc_tag_set() may change nr_hw_queues for kdump kernel, so
> > + * driver has to take blk-mq max supported nr_hw_queues into account
> > + * when figuring out nr_hw_queues from hardware info, for avoiding
> > + * inconsistency between driver and blk-mq.
> > + */
> > +unsigned int blk_mq_max_nr_hw_queues(void)
> > +{
> > +	if (is_kdump_kernel())
> > +		return 1;
> > +	return nr_cpu_ids;
> > +}
> 
> We rely on the driver having set->nr_hw_queues == 1 for kdump, right?
> 
> If so, how about enforcing it then, like:
> 
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -4426,7 +4426,8 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>         * 64 tags to prevent using too much memory.
>         */
>        if (is_kdump_kernel()) {
> -               set->nr_hw_queues = 1;
> +               if (set->nr_hw_queues != 1)
> +                       return -EINVAL;
>                set->nr_maps = 1;
>                set->queue_depth = min(64U, set->queue_depth);
>        }

In theory, this way should be ideal approach, but it needs all MQ drivers
to get converted with blk_mq_max_nr_hw_queues().

However, it shouldn't be one issue for non-managed irq given non-managed
irq can be migrated to the only online cpu.

Thanks,
Ming