Re: [PATCH V7 5/9] blk-mq: stop to handle IO and drain IO before hctx becomes inactive

Christoph Hellwig <hch@xxxxxx> · Thu, 23 Apr 2020 09:38:38 +0200

On Sat, Apr 18, 2020 at 11:09:21AM +0800, Ming Lei wrote:
> -static bool blk_mq_get_driver_tag(struct request *rq)
> +static bool blk_mq_get_driver_tag(struct request *rq, bool direct_issue)
>  {
>  	struct blk_mq_alloc_data data = {
>  		.q = rq->q,
> @@ -1054,6 +1054,23 @@ static bool blk_mq_get_driver_tag(struct request *rq)
>  		data.hctx->tags->rqs[rq->tag] = rq;
>  	}
>  allocated:
> +	/*
> +	 * Add one memory barrier in case that direct issue IO process
> +	 * is migrated to other CPU which may not belong to this hctx,
> +	 * so we can order driver tag assignment and checking
> +	 * BLK_MQ_S_INACTIVE. Otherwise, barrier() is enough given the
> +	 * two code paths are run on single CPU in case that
> +	 * BLK_MQ_S_INACTIVE is set.

Please use up all 80 characters for the comments (also elsewhere). That
being said I fail to see what the barrier() as a pure compiler barrier
even buys us here.

> +	 */
> +	if (unlikely(direct_issue && rq->mq_ctx->cpu != raw_smp_processor_id()))
> +		smp_mb();
> +	else
> +		barrier();
> +
> +	if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &data.hctx->state))) {
> +		blk_mq_put_driver_tag(rq);
> +		return false;
> +	}
>  	return rq->tag != -1;

Also if you take my cleanup to patch 2, we could just open code the
direct_issue case in the only caller instead of having the magic in the
common routine.

> +	if ((cpumask_next_and(-1, hctx->cpumask, cpu_online_mask) != cpu) ||
> +			(cpumask_next_and(cpu, hctx->cpumask, cpu_online_mask)
> +			 < nr_cpu_ids))

No need for the inner braces.  Also in this case I think something like:

	if (cpumask_next_and(-1, hctx->cpumask, cpu_online_mask) != cpu ||
	    cpumask_next_and(cpu, hctx->cpumask, cpu_online_mask) < nr_cpu_ids)

might be a tad more readable, but then again this might even be worth
a little inline helper once we start bike shedding.

> +	/*
> +	 * The current CPU is the last one in this hctx, S_INACTIVE
> +	 * can be observed in dispatch path without any barrier needed,
> +	 * cause both are run on one same CPU.
> +	 */
> +	set_bit(BLK_MQ_S_INACTIVE, &hctx->state);
> +	/*
> +	 * Order setting BLK_MQ_S_INACTIVE and checking rq->tag & rqs[tag],
> +	 * and its pair is the smp_mb() in blk_mq_get_driver_tag
> +	 */
> +	smp_mb();
> +	blk_mq_hctx_drain_inflight_rqs(hctx);
> +	return 0;

FYI, Documentation/core-api/atomic_ops.rst asks for using
smp_mb__before_atomic / smp_mb__after_atomic around the bitops.

> +static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node,
> +			struct blk_mq_hw_ctx, cpuhp_online);
> +
> +	if (!cpumask_test_cpu(cpu, hctx->cpumask))
> +		return 0;
> +
> +	clear_bit(BLK_MQ_S_INACTIVE, &hctx->state);
>  	return 0;
>  }

Why not simply:

	if (cpumask_test_cpu(cpu, hctx->cpumask))
		clear_bit(BLK_MQ_S_INACTIVE, &hctx->state);
	return 0;

Conceptually the changes look fine.