On 2/18/22 10:41, Melanie Plageman (Microsoft) wrote:
Currently a single blk_mq_tag_set is associated with a Scsi_Host. When SCSI controllers are limited, attaching multiple devices to the same controller is required. In cloud environments with relatively high latency persistent storage, requiring all devices on a controller to share a single blk_mq_tag_set negatively impacts performance. For example: a device provisioned with high IOPS and BW limits on the same controller as a smaller and slower device can starve the slower device of tags. This is especially noticeable when the slower device's workload has low I/O depth tasks.
The Cc-list for this patch series is way too long. Cc-ing linux-scsi and the most active SCSI contributors would have been sufficient. Is the reported behavior reproducible with an upstream Linux kernel? I'm asking this because I think that the following block layer code should prevent the reported starvation: /* * For shared tag users, we track the number of currently active users * and attempt to provide a fair share of the tag depth for each of them. */ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt) { unsigned int depth, users; if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) return true; /* * Don't try dividing an ant */ if (bt->sb.depth == 1) return true; if (blk_mq_is_shared_tags(hctx->flags)) { struct request_queue *q = hctx->queue; if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) return true; } else { if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) return true; } users = atomic_read(&hctx->tags->active_queues); if (!users) return true; /* * Allow at least some tags */ depth = max((bt->sb.depth + users - 1) / users, 4U); return __blk_mq_active_requests(hctx) < depth; } Bart.