Re: [PATCH] blk-mq: balance mapping between CPUs and queues

Ming Lei <ming.lei@xxxxxxxxxx> · Thu, 25 Jul 2019 17:18:08 +0800

On Thu, Jul 25, 2019 at 04:35:30PM +0800, Bob Liu wrote:
> On 7/25/19 4:26 PM, Ming Lei wrote:
> > Spread queues among present CPUs first, then building the mapping
> > on other non-present CPUs.
> > 
> > So we can minimize count of dead queues which are mapped by un-present
> > CPUs only. Then bad IO performance can be avoided by this unbalanced
> > mapping between CPUs and queues.
> > 
> > The similar policy has been applied on Managed IRQ affinity.
> > 
> > Reported-by: Yi Zhang <yi.zhang@xxxxxxxxxx>
> > Cc: Yi Zhang <yi.zhang@xxxxxxxxxx>
> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > ---
> >  block/blk-mq-cpumap.c | 34 +++++++++++++++++++++++-----------
> >  1 file changed, 23 insertions(+), 11 deletions(-)
> > 
> > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> > index f945621a0e8f..e217f3404dc7 100644
> > --- a/block/blk-mq-cpumap.c
> > +++ b/block/blk-mq-cpumap.c
> > @@ -15,10 +15,9 @@
> >  #include "blk.h"
> >  #include "blk-mq.h"
> >  
> > -static int cpu_to_queue_index(struct blk_mq_queue_map *qmap,
> > -			      unsigned int nr_queues, const int cpu)
> > +static int queue_index(struct blk_mq_queue_map *qmap, const int q)
> >  {
> > -	return qmap->queue_offset + (cpu % nr_queues);
> > +	return qmap->queue_offset + q;
> >  }
> >  
> >  static int get_first_sibling(unsigned int cpu)
> > @@ -36,23 +35,36 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
> >  {
> >  	unsigned int *map = qmap->mq_map;
> >  	unsigned int nr_queues = qmap->nr_queues;
> > -	unsigned int cpu, first_sibling;
> > +	unsigned int cpu, first_sibling, q = 0;
> > +
> > +	for_each_possible_cpu(cpu)
> > +		map[cpu] = -1;
> > +
> > +	/*
> > +	 * Spread queues among present CPUs first for minimizing
> > +	 * count of dead queues which are mapped by all un-present CPUs
> > +	 */
> > +	for_each_present_cpu(cpu) {
> > +		if (q >= nr_queues)
> > +			break;
> > +		map[cpu] = queue_index(qmap, q++);
> > +	}
> >  
> >  	for_each_possible_cpu(cpu) {
> > +		if (map[cpu] != -1)
> > +			continue;
> >  		/*
> >  		 * First do sequential mapping between CPUs and queues.
> >  		 * In case we still have CPUs to map, and we have some number of
> >  		 * threads per cores then map sibling threads to the same queue
> >  		 * for performance optimizations.
> >  		 */
> > -		if (cpu < nr_queues) {
> > -			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
> 
> Why not keep this similarly? 

Because the sequential mapping has been done already among present CPUs.

> 
> > +		first_sibling = get_first_sibling(cpu);
> > +		if (first_sibling == cpu) {
> > +			map[cpu] = queue_index(qmap, q);
> > +			q = (q + 1) % nr_queues;
> >  		} else {
> > -			first_sibling = get_first_sibling(cpu);
> > -			if (first_sibling == cpu)
> > -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
> > -			else
> > -				map[cpu] = map[first_sibling];
> > +			map[cpu] = map[first_sibling];
> 
> Then no need to share queue if nr_queues is enough for all possible cpu.

I am not sure I follow your idea. There isn't 'enough' stuff wrt.
nr_queues, which is just usually <= nr_queues.

The valid mapping has to cover all possible CPUs, and each queue's
mapping can't be overlapped with others. That is exactly what
the patch is doing.

If you think somewhere is wrong or not good enough, please point it
out.

thanks, 
Ming