On 26/11/2019 11:27, Hannes Reinecke wrote:
On 11/26/19 12:05 PM, Ming Lei wrote:
On Tue, Nov 26, 2019 at 10:14:12AM +0100, Hannes Reinecke wrote:
[ .. ]
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ca22afd47b3d..f3589f42b96d 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -452,7 +452,7 @@ static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
{
if (hctx->sched_tags) {
blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx);
- blk_mq_free_rq_map(hctx->sched_tags);
+ blk_mq_free_rq_map(hctx->sched_tags, false);
hctx->sched_tags = NULL;
}
}
@@ -462,10 +462,14 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
unsigned int hctx_idx)
{
struct blk_mq_tag_set *set = q->tag_set;
+ int flags = set->flags;
int ret;
+ /* Scheduler tags are never shared */
+ set->flags &= ~BLK_MQ_F_TAG_HCTX_SHARED;
hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
set->reserved_tags);
+ set->flags = flags;
This way is very fragile, race is made against other uses of
blk_mq_is_sbitmap_shared().
We are allocating tags, I don't think we're even able to modify it at
this point.
From performance viewpoint, all hctx belonging to this request queue should
share one scheduler tagset in case of BLK_MQ_F_TAG_HCTX_SHARED, cause
driver tag queue depth isn't changed.
Hmm. Now you get me confused.
In an earlier mail you said:
This kind of sharing is wrong, sched tags should be request
queue wide instead of tagset wide, and each request queue has
its own & independent scheduler queue.
as in v2 we _had_ shared scheduler tags, too.
Did I misread your comment above?
As I understand, we just don't require shared scheduler tags. It's only
blk_mq_tag_set.tags which need to use the shared sbitmap.
if (!hctx->sched_tags)
return -ENOMEM;
[ .. ]
@@ -2456,7 +2459,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
{
if (set->tags && set->tags[hctx_idx]) {
blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
- blk_mq_free_rq_map(set->tags[hctx_idx]);
+ blk_mq_free_rq_map(set->tags[hctx_idx],
+ blk_mq_is_sbitmap_shared(set));
set->tags[hctx_idx] = NULL;
}
Who will free the shared tags finally in case of blk_mq_is_sbitmap_shared()?
Hmm. Indeed, that bit is missing; will be adding it.
}
[ .. ]
@@ -3168,8 +3186,17 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
q->elevator->type->ops.depth_updated(hctx);
}
- if (!ret)
+ if (!ret) {
+ if (blk_mq_is_sbitmap_shared(set)) {
+ sbitmap_queue_resize(&set->shared_bitmap_tags, nr);
+ sbitmap_queue_resize(&set->shared_breserved_tags, nr);
+ }
The above change is wrong in case of hctx->sched_tags.
Yes, we need to add a marker here if these are 'normal' or 'scheduler'
tags. This will also help when allocating as then we won't need this
flag twiddling you've complained about.
nit: To be proper, since this is tag management, we should move it to
blk-mq-tags.c
q->nr_requests = nr;
+ }
+ /*
+ * if ret != 0, q->nr_requests would not be updated, yet the depth
+ * for some hctx may have changed - is that right?
+ */
blk_mq_unquiesce_queue(q);
blk_mq_unfreeze_queue(q);
[ .. ]
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 147185394a25..670e9a949d32 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -109,6 +109,12 @@ struct blk_mq_tag_set {
unsigned int flags; /* BLK_MQ_F_* */
void *driver_data;
+ struct sbitmap_queue shared_bitmap_tags;
+ struct sbitmap_queue shared_breserved_tags;
+
+ struct sbitmap_queue sched_shared_bitmap_tags;
+ struct sbitmap_queue sched_shared_breserved_tags;
+
The above two fields aren't used in this patch.
Left-overs from merging. Will be removed.
Cheers,
Hannes