Re: [PATCH V8 07/11] blk-mq: stop to handle IO and drain IO before hctx becomes inactive

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 29 Apr 2020 10:16:12 +0800

Hi Peter,

On Tue, Apr 28, 2020 at 05:58:37PM +0200, Peter Zijlstra wrote:
> On Sat, Apr 25, 2020 at 05:48:32PM +0200, Christoph Hellwig wrote:
> >  		atomic_inc(&data.hctx->nr_active);
> >  	}
> >  	data.hctx->tags->rqs[rq->tag] = rq;
> >  
> >  	/*
> > +	 * Ensure updates to rq->tag and tags->rqs[] are seen by
> > +	 * blk_mq_tags_inflight_rqs.  This pairs with the smp_mb__after_atomic
> > +	 * in blk_mq_hctx_notify_offline.  This only matters in case a process
> > +	 * gets migrated to another CPU that is not mapped to this hctx.
> >  	 */
> > +	if (rq->mq_ctx->cpu != get_cpu())
> >  		smp_mb();
> > +	put_cpu();
> 
> This looks exceedingly weird; how do you think you can get to another
> CPU and not have an smp_mb() implied in the migration itself? Also, what

What we need is one smp_mb() between the following two OPs:

1) 
   rq->tag = rq->internal_tag;
   data.hctx->tags->rqs[rq->tag] = rq;

2) 
	if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &rq->mq_hctx->state)))

And the pair of the above barrier is in blk_mq_hctx_notify_offline().

So if this process is migrated before 1), the implied smp_mb() is useless.

> stops the migration from happening right after the put_cpu() ?

If the migration happens after put_cpu(), the above two OPs are still
ordered by the implied smp_mb(), so looks not a problem.

> 
> 
> >  	if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &rq->mq_hctx->state))) {
> >  		blk_mq_put_driver_tag(rq);
> 
> 
> > +static inline bool blk_mq_last_cpu_in_hctx(unsigned int cpu,
> > +		struct blk_mq_hw_ctx *hctx)
> >  {
> > +	if (!cpumask_test_cpu(cpu, hctx->cpumask))
> > +		return false;
> > +	if (cpumask_next_and(-1, hctx->cpumask, cpu_online_mask) != cpu)
> > +		return false;
> > +	if (cpumask_next_and(cpu, hctx->cpumask, cpu_online_mask) < nr_cpu_ids)
> > +		return false;
> > +	return true;
> >  }
> 
> Does this want something like:
> 
> 	lockdep_assert_held(*set->tag_list_lock);
> 
> to make sure hctx->cpumask is stable? Those mask ops are not stable vs
> concurrenct set/clear at all.

hctx->cpumask is only updated in __blk_mq_update_nr_hw_queues(), in
which all request queues in the tagset have been frozen, and no any
in-flight IOs, so we needn't to pay attention to that case.

Thanks, 
Ming