On Thu, Jul 30, 2020 at 09:10:48AM -0700, Sagi Grimberg wrote: > > > > > > > > In case of BLK_MQ_F_BLOCKING, blk-mq uses SRCU to mark read critical > > > > > > > section during dispatching request, then request queue quiesce is based on > > > > > > > SRCU. What we want to get is low cost added in fast path. > > > > > > > > > > > > > > However, from srcu_read_lock/srcu_read_unlock implementation, not see > > > > > > > it is quicker than percpu refcount, so use percpu_ref to implement > > > > > > > queue quiesce. This usage is cleaner and simpler & enough for implementing > > > > > > > queue quiesce. The main requirement is to make sure all read sections to observe > > > > > > > QUEUE_FLAG_QUIESCED once blk_mq_quiesce_queue() returns. > > > > > > > > > > > > > > Also it becomes much easier to add interface of async queue quiesce. > > > > > > > > > > > > BTW, no obvious IOPS difference is observed with this patch applied when running > > > > > > io on null_blk(blocking, submit_queues=32) in one dual-socket, 32cores system. > > > > > > > > > > Thanks Ming, can you test for non-blocking on the same setup? > > > > > > > > OK, I can do that, but this patch supposes to not affect non-blocking, > > > > care to share your motivation for testing non-blocking? > > > > > > I think it will be a significant improvement to have a single code path. > > > The code will be more robust and we won't need to face issues that are > > > specific for blocking. > > > > > > If the cost is negligible, I think the upside is worth it. > > > > > > > rcu_read_lock and rcu_read_unlock has been proved as efficient enough, > > and I don't think percpu_refcount is better than it, so I'd suggest to > > not switch non-blocking into this way. > > It's not a matter of which is better, its a matter of making the code > more robust because it has a single code-path. If moving to percpu_ref > is negligible, I would suggest to move both, I don't want to have two > completely different mechanism for blocking vs. non-blocking. RCU and SRCU have been different mechanism already. > > > BTW, in case of blocking, one hctx may dispatch at most one request because there > > is only single .run_work, even though when .queue_rq() is slept, that said > > blk_mq_submit_bio() queues bio in sync style. This way won't be very efficient. > > So percpu_refcount should be good enough for blocking code path, but may not be > > well enough for non-blocking case. > > Not sure what you mean, the percpu_ref is taken exactly where rcu is > taken, not sure what is the difference. My point is that blocking can't be efficient enough, when .queue_rq() is slept, no any request can be queued to this hctx any more. thanks, Ming