Re: [PATCH V8 07/11] blk-mq: stop to handle IO and drain IO before hctx becomes inactive

Will Deacon <will@xxxxxxxxxx> · Thu, 30 Apr 2020 12:04:29 +0100

On Thu, Apr 30, 2020 at 08:39:45AM +0800, Ming Lei wrote:
> On Wed, Apr 29, 2020 at 06:34:01PM +0100, Will Deacon wrote:
> > On Wed, Apr 29, 2020 at 09:43:27PM +0800, Ming Lei wrote:
> > > Please see the following two code paths:
> > > 
> > > [1] code path1:
> > > blk_mq_hctx_notify_offline():
> > > 	set_bit(BLK_MQ_S_INACTIVE, &hctx->state);
> > > 
> > > 	smp_mb() or smp_mb_after_atomic()
> > > 
> > > 	blk_mq_hctx_drain_inflight_rqs():
> > > 		blk_mq_tags_inflight_rqs()
> > > 			rq = hctx->tags->rqs[index]
> > > 			and
> > > 			READ rq->tag
> > > 
> > > [2] code path2:
> > > 	blk_mq_get_driver_tag():
> > > 
> > > 		process might be migrated to other CPU here and chance is small,
> > > 		then the follow code will be run on CPU different with code path1
> > > 
> > > 		rq->tag = rq->internal_tag;
> > > 		hctx->tags->rqs[rq->tag] = rq;
> > 
> > I /think/ this can be distilled to the SB litmus test:
> > 
> > 	// blk_mq_hctx_notify_offline()		blk_mq_get_driver_tag();
> > 	Wstate = INACTIVE			Wtag
> > 	smp_mb()				smp_mb()
> > 	Rtag					Rstate
> > 
> > and you want to make sure that either blk_mq_get_driver_tag() sees the
> > state as INACTIVE and does the cleanup, or it doesn't and
> > blk_mq_hctx_notify_offline() sees the newly written tag and waits for the
> > request to complete (I don't get how that happens, but hey).
> > 
> > Is that right?
> 
> Yeah, exactly.
> 
> > 
> > > 		barrier() in case that code path2 is run on same CPU with code path1
> > > 		OR
> > > 		smp_mb() in case that code path2 is run on different CPU with code path1 because
> > > 		of process migration
> > > 		
> > > 		test_bit(BLK_MQ_S_INACTIVE, &data.hctx->state)
> > 
> > Couldn't you just check this at the start of blk_mq_get_driver_tag() as
> > well, and then make the smp_mb() unconditional?
> 
> As I mentioned, the chance for the current process(calling
> blk_mq_get_driver_tag()) migration is very small, we do want to
> avoid the extra smp_mb() in the fast path.

Hmm, but your suggestion of checking 'rq->mq_ctx->cpu' only works if that
is the same CPU on which blk_mq_hctx_notify_offline() executes. What
provides that guarantee?

If there's any chance of this thing being concurrent, then you need the
barrier there just in case. So I'd say you either need to prevent the race,
or live with the barrier. Do you have numbers to show how expensive it is?

Will