Re: [PATCH] blk-mq: complete req in softirq context in case of single queue

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 28 Sep 2018 16:10:12 +0800

On Thu, Sep 27, 2018 at 11:30:19AM +0800, jianchao.wang wrote:
> Hi Ming
> 
> On 09/27/2018 12:08 AM, Ming Lei wrote:
> > Lot of controllers may have only one irq vector for completing IO
> > request. And usually affinity of the only irq vector is all possible
> > CPUs, however, on most of ARCH, there may be only one specific CPU
> > for handling this interrupt.
> > 
> > So if all IOs are completed in hardirq context, it is inevitable to
> > degrade IO performance because of increased irq latency.
> > 
> > This patch tries to address this issue by allowing to complete request
> > in softirq context, like the legacy IO path.
> > 
> > IOPS is observed as ~13%+ in the following randread test on raid0 over
> > virtio-scsi.
> > 
> > mdadm --create --verbose /dev/md0 --level=0 --chunk=1024 --raid-devices=8 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi
> > 
> > fio --time_based --name=benchmark --runtime=30 --filename=/dev/md0 --nrfiles=1 --ioengine=libaio --iodepth=32 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=32 --rw=randread --blocksize=4k
> > 
> > Cc: Zach Marano <zmarano@xxxxxxxxxx>
> > Cc: Christoph Hellwig <hch@xxxxxx>
> > Cc: Bart Van Assche <bvanassche@xxxxxxx>
> > Cc: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx>
> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > ---
> >  block/blk-mq.c      | 14 ++++++++++++++
> >  block/blk-softirq.c |  7 +++++--
> >  2 files changed, 19 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 85a1c1a59c72..d4792c3ac983 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -565,6 +565,20 @@ static void __blk_mq_complete_request(struct request *rq)
> >  	if (rq->internal_tag != -1)
> >  		blk_mq_sched_completed_request(rq);
> >  
> > +	/*
> > +	 * Most of single queue controllers, there is only one irq vector
> > +	 * for handling IO completion, and the only irq's affinity is set
> > +	 * as all possible CPUs. On most of ARCHs, this affinity means the
> > +	 * irq is handled on one specific CPU.
> > +	 *
> > +	 * So complete IO reqeust in softirq context in case of single queue
> > +	 * for not degrading IO performance by irqsoff latency.
> > +	 */
> > +	if (rq->q->nr_hw_queues == 1) {
> > +		__blk_complete_request(rq);
> > +		return;
> > +	}
> > +
> >  	if (!test_bit(QUEUE_FLAG_SAME_COMP, &rq->q->queue_flags)) {
> >  		rq->q->softirq_done_fn(rq);
> >  		return;
> > diff --git a/block/blk-softirq.c b/block/blk-softirq.c
> > index 15c1f5e12eb8..b1df9b6c1731 100644
> > --- a/block/blk-softirq.c
> > +++ b/block/blk-softirq.c
> > @@ -101,17 +101,20 @@ void __blk_complete_request(struct request *req)
> >  	struct request_queue *q = req->q;
> >  	unsigned long flags;
> >  	bool shared = false;
> > +	int rq_cpu;
> >  
> >  	BUG_ON(!q->softirq_done_fn);
> >  
> > +	rq_cpu = q->mq_ops ? req->mq_ctx->cpu : req->cpu;
> > +
> >  	local_irq_save(flags);
> >  	cpu = smp_processor_id();
> >  
> >  	/*
> >  	 * Select completion CPU
> >  	 */
> > -	if (req->cpu != -1) {
> > -		ccpu = req->cpu;
> > +	if (rq_cpu != -1) {
> > +		ccpu = q->mq_ops ? req->mq_ctx->cpu : req->cpu;
> >  		if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))
> >  			shared = cpus_share_cache(cpu, ccpu);
> >  	} else
> > 
> ;
> 
> Only QUEUE_FLAG_SAME_COMP is set, we will try to complete the request on the cpu
> where it is allocated.
> 
> For the blk-legacy, the req->cpu is -1 when QUEUE_FLAG_SAME_COMP is not set.
> 
> blk_queue_bio
> 
> 	if (test_bit(QUEUE_FLAG_SAME_COMP, &q->queue_flags))
> 		req->cpu = raw_smp_processor_id()
> 
> But for the blk-mq, the req->mq_ctx->cpu must be valid.
> So we should check the QUEUE_FLAG_SAME_COMP for the blk-mq case.

Good catch, will fix it in V2.

> 
> On the other hand, how about split the softirq completion part out of
> __blk_complete_request ？ It is confused when find __blk_complete_request
> in __blk_mq_complete_request.

__blk_complete_request() is just a common helper between blk-mq and legacy
path. We have such usages already, so looks it is fine. 

Thanks,
Ming