On 1/19/24 4:24 PM, Bart Van Assche wrote: > On 1/19/24 08:02, Jens Axboe wrote: >> + /* >> + * If someone else is already dispatching, skip this one. This will >> + * defer the next dispatch event to when something completes, and could >> + * potentially lower the queue depth for contended cases. >> + * >> + * See the logic in blk_mq_do_dispatch_sched(), which loops and >> + * retries if nothing is dispatched. >> + */ >> + if (test_bit(DD_DISPATCHING, &dd->run_state) || >> + test_and_set_bit(DD_DISPATCHING, &dd->run_state)) >> + return NULL; >> + >> spin_lock(&dd->lock); >> rq = dd_dispatch_prio_aged_requests(dd, now); >> if (rq) >> @@ -616,6 +635,7 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx) >> } >> unlock: >> + clear_bit(DD_DISPATCHING, &dd->run_state); >> spin_unlock(&dd->lock); > > From Documentation/memory-barriers.txt: "These are also used for atomic RMW > bitop functions that do not imply a memory barrier (such as set_bit and > clear_bit)." Does this mean that CPUs with a weak memory model (e.g. ARM) > are allowed to execute the clear_bit() call earlier than where it occurs in > the code? I think that spin_trylock() has "acquire" semantics and also that > "spin_unlock()" has release semantics. While a CPU is allowed to execute > clear_bit() before the memory operations that come before it, I don't think > that is the case for spin_unlock(). See also > tools/memory-model/Documentation/locking.txt. Not sure why I didn't do it upfront, but they just need to be the _lock variants of the bitops. I'll make that change. -- Jens Axboe