Re: [PATCH V8 07/11] blk-mq: stop to handle IO and drain IO before hctx becomes inactive

Will Deacon <will@xxxxxxxxxx> · Wed, 29 Apr 2020 13:27:57 +0100

On Wed, Apr 29, 2020 at 05:46:16PM +0800, Ming Lei wrote:
> On Wed, Apr 29, 2020 at 09:07:29AM +0100, Will Deacon wrote:
> > On Wed, Apr 29, 2020 at 10:16:12AM +0800, Ming Lei wrote:
> > > On Tue, Apr 28, 2020 at 05:58:37PM +0200, Peter Zijlstra wrote:
> > > > On Sat, Apr 25, 2020 at 05:48:32PM +0200, Christoph Hellwig wrote:
> > > > >  		atomic_inc(&data.hctx->nr_active);
> > > > >  	}
> > > > >  	data.hctx->tags->rqs[rq->tag] = rq;
> > > > >  
> > > > >  	/*
> > > > > +	 * Ensure updates to rq->tag and tags->rqs[] are seen by
> > > > > +	 * blk_mq_tags_inflight_rqs.  This pairs with the smp_mb__after_atomic
> > > > > +	 * in blk_mq_hctx_notify_offline.  This only matters in case a process
> > > > > +	 * gets migrated to another CPU that is not mapped to this hctx.
> > > > >  	 */
> > > > > +	if (rq->mq_ctx->cpu != get_cpu())
> > > > >  		smp_mb();
> > > > > +	put_cpu();
> > > > 
> > > > This looks exceedingly weird; how do you think you can get to another
> > > > CPU and not have an smp_mb() implied in the migration itself? Also, what
> > > 
> > > What we need is one smp_mb() between the following two OPs:
> > > 
> > > 1) 
> > >    rq->tag = rq->internal_tag;
> > >    data.hctx->tags->rqs[rq->tag] = rq;
> > > 
> > > 2) 
> > > 	if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &rq->mq_hctx->state)))
> > > 
> > > And the pair of the above barrier is in blk_mq_hctx_notify_offline().
> > 
> > I'm struggling with this, so let me explain why. My understanding of the
> > original patch [1] and your explanation above is that you want *either* of
> > the following behaviours
> > 
> >   - __blk_mq_get_driver_tag() (i.e. (1) above) and test_bit(BLK_MQ_S_INACTIVE, ...)
> >     run on the same CPU with barrier() between them, or
> > 
> >   - There is a migration and therefore an implied smp_mb() between them
> > 
> > However, given that most CPUs can speculate loads (and therefore the
> > test_bit() operation), I don't understand how the "everything runs on the
> > same CPU" is safe if a barrier() is required.  In other words, if the
> > barrier() is needed to prevent the compiler hoisting the load, then the CPU
> > can still cause problems.
> 
> Do you think the speculate loads may return wrong value of
> BLK_MQ_S_INACTIVE bit in case of single CPU? BTW, writing the bit is
> done on the same CPU. If yes, this machine may not obey cache consistency,
> IMO.

If the write is on the same CPU, then the read will of course return the
value written by that write, otherwise we'd have much bigger problems!

But then I'm confused, because you're saying that the write is done on the
same CPU, but previously you were saying that migration occuring before (1)
was problematic. Can you explain a bit more about that case, please? What
is running before (1) that is relevant here?

Thanks,

Will