Re: [PATCH RFC v1 0/5] Add SCSI per device tagsets

Bart Van Assche <bvanassche@xxxxxxx> · Tue, 22 Feb 2022 05:36:36 -0800

On 2/18/22 10:41, Melanie Plageman (Microsoft) wrote:
Currently a single blk_mq_tag_set is associated with a Scsi_Host. When SCSI
controllers are limited, attaching multiple devices to the same controller is
required. In cloud environments with relatively high latency persistent storage,
requiring all devices on a controller to share a single blk_mq_tag_set
negatively impacts performance.

For example: a device provisioned with high IOPS and BW limits on the same
controller as a smaller and slower device can starve the slower device of tags.
This is especially noticeable when the slower device's workload has low I/O
depth tasks.

The Cc-list for this patch series is way too long. Cc-ing linux-scsi and
the most active SCSI contributors would have been sufficient.

Is the reported behavior reproducible with an upstream Linux kernel? I'm
asking this because I think that the following block layer code should
prevent the reported starvation:

/*
 * For shared tag users, we track the number of currently active users
 * and attempt to provide a fair share of the tag depth for each of them.
 */
static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
				  struct sbitmap_queue *bt)
{
	unsigned int depth, users;

	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
		return true;

	/*
	 * Don't try dividing an ant
	 */
	if (bt->sb.depth == 1)
		return true;

	if (blk_mq_is_shared_tags(hctx->flags)) {
		struct request_queue *q = hctx->queue;

		if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
			return true;
	} else {
		if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
			return true;
	}

	users = atomic_read(&hctx->tags->active_queues);

	if (!users)
		return true;

	/*
	 * Allow at least some tags
	 */
	depth = max((bt->sb.depth + users - 1) / users, 4U);
	return __blk_mq_active_requests(hctx) < depth;
}

Bart.