Re: [PATCH v2 1/2] blk-mq: add tagset quiesce interface

Chao Leng <lengchao@xxxxxxxxxx> · Wed, 19 Oct 2022 10:39:41 +0800

On 2022/10/18 23:04, Paul E. McKenney wrote:
On Tue, Oct 18, 2022 at 05:52:06PM +0800, Chao Leng wrote:
On 2022/10/17 23:21, Paul E. McKenney wrote:
On Mon, Oct 17, 2022 at 03:39:06PM +0200, Christoph Hellwig wrote:
On Thu, Oct 13, 2022 at 05:44:49PM +0800, Chao Leng wrote:
+	rcu = kvmalloc(count * sizeof(*rcu), GFP_KERNEL);
+	if (rcu) {
+		list_for_each_entry(q, &set->tag_list, tag_set_list) {
+			if (blk_queue_noquiesced(q))
+				continue;
+
+			init_rcu_head(&rcu[i].head);
+			init_completion(&rcu[i].completion);
+			call_srcu(q->srcu, &rcu[i].head, wakeme_after_rcu);
+			i++;
+		}
+
+		for (i = 0; i < count; i++) {
+			wait_for_completion(&rcu[i].completion);
+			destroy_rcu_head(&rcu[i].head);
+		}
+		kvfree(rcu);
+	} else {
+		list_for_each_entry(q, &set->tag_list, tag_set_list)
+			synchronize_srcu(q->srcu);
+	}

Having to allocate a struct rcu_synchronize for each of the potentially
many queues here is a bit sad.

Pull just explained the start_poll_synchronize_rcu interfaces at ALPSS
last week, so I wonder if something like that would also be feasible
for SRCU, as that would come in really handy here.

There is start_poll_synchronize_srcu() and poll_state_synchronize_srcu(),
but there would need to be an unsigned long for each srcu_struct from
which an SRCU grace period was required.  This would be half the size
of the "rcu" array above, but still maybe larger than you would like.

The resulting code might look something like this, with "rcu" now being
a pointer to unsigned long:

	rcu = kvmalloc(count * sizeof(*rcu), GFP_KERNEL);
	if (rcu) {
		list_for_each_entry(q, &set->tag_list, tag_set_list) {
			if (blk_queue_noquiesced(q))
				continue;
			rcu[i] = start_poll_synchronize_srcu(q->srcu);
			i++;
		}

		for (i = 0; i < count; i++)
			if (!poll_state_synchronize_srcu(q->srcu))
				synchronize_srcu(q->srcu);
synchronize_srcu will restart a new period of grace.

True, but SRCU grace periods normally complete reasonably quickly, so
the synchronize_srcu() might well be faster than the loop, depending on
what the corresponding SRCU readers are doing.
Yes, Different runtimes have different wait times, and it's hard to say
which is better or worse.

Maybe it would be better like this:
			while (!poll_state_synchronize_srcu(q->srcu, rcu[i]))
				schedule_timeout_uninterruptible(1);

Why not try it both ways and see what happens?  Assuming that is, that
the differences matter in this code.

							Thanx, Paul

		kvfree(rcu);
	} else {
		list_for_each_entry(q, &set->tag_list, tag_set_list)
			synchronize_srcu(q->srcu);
	}

Or as Christoph suggested, just have a single srcu_struct for the
whole group.

The main reason for having multiple srcu_struct structures is to
prevent the readers from one from holding up the updaters from another.
Except that by waiting for the multiple grace periods, you are losing
that property anyway, correct?  Or is this code waiting on only a small
fraction of the srcu_struct structures associated with blk_queue?

							Thanx, Paul
.

.