Re: [RFC PATCH v3 3/3] blk-mq: Lockout tagset iterator when exiting elevator

John Garry <john.garry@xxxxxxxxxx> · Mon, 8 Mar 2021 11:17:28 +0000

On 06/03/2021 04:43, Bart Van Assche wrote:
On 3/5/21 7:14 AM, John Garry wrote:

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 7ff1b20d58e7..5950fee490e8 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -358,11 +358,16 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
  {
  	int i;
  
+	if (!atomic_inc_not_zero(&tagset->iter_usage_counter))
+		return;
+
  	for (i = 0; i < tagset->nr_hw_queues; i++) {
  		if (tagset->tags && tagset->tags[i])
  			__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
  					      BT_TAG_ITER_STARTED);
  	}
+
+	atomic_dec(&tagset->iter_usage_counter);
  }
  EXPORT_SYMBOL(blk_mq_tagset_busy_iter);

Hi Bart,

This changes the behavior of blk_mq_tagset_busy_iter(). What will e.g.
happen if the mtip driver calls blk_mq_tagset_busy_iter(&dd->tags,
mtip_abort_cmd, dd) concurrently with another blk_mq_tagset_busy_iter()
call and if that causes all mtip_abort_cmd() calls to be skipped?

I'm not sure that I understand this problem you describe. So if 
blk_mq_tagset_busy_iter(&dd->tags, mtip_abort_cmd, dd) is called, either 
can happen:
a. normal operation, iter_usage_counter initially holds >= 1, and then 
iter_usage_counter is incremented in blk_mq_tagset_busy_iter() and we 
iter the busy tags. Any parallel call to blk_mq_tagset_busy_iter() will 
also increase iter_usage_counter.
b. we're switching IO scheduler. In this scenario, first we quiesce all 
queues. After that, there should be no active requests. At that point, 
we ensure any calls to blk_mq_tagset_busy_iter() are finished and block 
(or discard may be a better term) any more calls. Blocking any more 
calls should be safe as there are no requests to iter. atomic_cmpxchg() 
is used to set iter_usage_counter to 0, blocking any more calls.


+	while (atomic_cmpxchg(&set->iter_usage_counter, 1, 0) != 1);
Isn't it recommended to call cpu_relax() inside busy-waiting loops?

Maybe, but I am considering changing this patch to use percpu_refcnt() - 
I need to check it further.


  	blk_mq_sched_free_requests(q);
  	__elevator_exit(q, e);
  
+	atomic_set(&set->iter_usage_counter, 1);
Can it happen that the above atomic_set() call happens while a
blk_mq_tagset_busy_iter() call is in progress?

No, as at this point it should be ensured that iter_usage_counter holds 
0 from atomic_cmpxchg(), so there should be no active processes in 
blk_mq_tagset_busy_iter() sensitive region. Calls to 
blk_mq_tagset_busy_iter() are blocked when iter_usage_counter holds 0.

Should that atomic_set()
call perhaps be changed into an atomic_inc() call?

They have the same affect in practice, but we use atomic_set() in 
blk_mq_alloc_tag_set(), so at least consistent.

Thanks,
John