On 1/19/24 08:02, Jens Axboe wrote:
+ /*
+ * If someone else is already dispatching, skip this one. This will
+ * defer the next dispatch event to when something completes, and could
+ * potentially lower the queue depth for contended cases.
+ *
+ * See the logic in blk_mq_do_dispatch_sched(), which loops and
+ * retries if nothing is dispatched.
+ */
+ if (test_bit(DD_DISPATCHING, &dd->run_state) ||
+ test_and_set_bit(DD_DISPATCHING, &dd->run_state))
+ return NULL;
+
spin_lock(&dd->lock);
rq = dd_dispatch_prio_aged_requests(dd, now);
if (rq)
@@ -616,6 +635,7 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
}
unlock:
+ clear_bit(DD_DISPATCHING, &dd->run_state);
spin_unlock(&dd->lock);
From Documentation/memory-barriers.txt: "These are also used for atomic RMW
bitop functions that do not imply a memory barrier (such as set_bit and
clear_bit)." Does this mean that CPUs with a weak memory model (e.g. ARM)
are allowed to execute the clear_bit() call earlier than where it occurs in
the code? I think that spin_trylock() has "acquire" semantics and also that
"spin_unlock()" has release semantics. While a CPU is allowed to execute
clear_bit() before the memory operations that come before it, I don't think
that is the case for spin_unlock(). See also
tools/memory-model/Documentation/locking.txt.
Thanks,
Bart.