Re: [PATCH V10 07/11] blk-mq: stop to handle IO and drain IO before hctx becomes inactive

Ming Lei <ming.lei@xxxxxxxxxx> · Sat, 9 May 2020 12:10:42 +0800

On Fri, May 08, 2020 at 08:24:44PM -0700, Bart Van Assche wrote:
> On 2020-05-08 19:20, Ming Lei wrote:
> > Not sure why you mention queue freezing.
> 
> This patch series introduces a fundamental race between modifying the
> hardware queue state (BLK_MQ_S_INACTIVE) and tag allocation. The only

Basically there are two cases:

1) setting BLK_MQ_S_INACTIVE and driver tag allocation are run on same
CPU, we just need a compiler barrier, that happens most of times

2) setting BLK_MQ_S_INACTIVE and driver tag allocation are run on
different CPUs, then one pair of smp_mb() is applied for avoiding
out of order, that only happens in case of direct issue process migration.

Please take a look at the comment in this patch:

+       /*
+        * In case that direct issue IO process is migrated to other CPU
+        * which may not belong to this hctx, add one memory barrier so we
+        * can order driver tag assignment and checking BLK_MQ_S_INACTIVE.
+        * Otherwise, barrier() is enough given both setting BLK_MQ_S_INACTIVE
+        * and driver tag assignment are run on the same CPU because
+        * BLK_MQ_S_INACTIVE is only set after the last CPU of this hctx is
+        * becoming offline.
+        *
+        * Process migration might happen after the check on current processor
+        * id, smp_mb() is implied by processor migration, so no need to worry
+        * about it.
+        */

And you may find more discussion about this topic in the following thread:

https://lore.kernel.org/linux-block/20200429134327.GC700644@T590/

> mechanism I know of for enforcing the order in which another thread
> observes writes to different memory locations without inserting a memory
> barrier in the hot path is RCU (see also The RCU-barrier menagerie;
> https://lwn.net/Articles/573497/). The only existing such mechanism in
> the blk-mq core I know of is queue freezing. Hence my comment about
> queue freezing.

You didn't explain how queue freezing is used for this issue.

We are talking about CPU hotplug vs. IO. In short, when one hctx becomes
inactive(all cpus in hctx->cpumask becomes offline), in-flight IO from
this hctx needs to be drained for avoiding io timeout. Also all requests
in scheduler/sw queue from this hctx needs to be handled correctly for
avoiding IO hang.

queue freezing can only be applied on the request queue level, and not
hctx level. When requests can't be completed, wait freezing just hangs
for-ever.

Thanks,
Ming