Re: [Question] about shared tags for SCSI drivers

Yufen Yu <yuyufen@xxxxxxxxxx> · Sun, 19 Jan 2020 21:57:40 +0800

On 2020/1/17 18:16, Ming Lei wrote:
On Fri, Jan 17, 2020 at 03:19:18PM +0800, Yufen Yu wrote:
Hi, ming

On 2020/1/16 17:03, Ming Lei wrote:
On Thu, Jan 16, 2020 at 12:06:02PM +0800, Yufen Yu wrote:
Hi, all

Shared tags is introduced to maintains a notion of fairness between
active users. This may be good for nvme with multiple namespace to
avoid starving some users. Right?

Actually nvme namespace is LUN of scsi world.

Shared tags isn't for maintaining fairness, it is just natural sw
implementation of scsi host's tags, since every scsi host shares
tags among all LUNs. If the SCSI host supports real MQ, the tags
is hw-queue wide, otherwise it is host wide.

However, I don't understand why we introduce the shared tag for SCSI.
IMO, there are two concerns for scsi shared tag:

1) For now, 'shost->can_queue' is used as queue depth in block layer.
And all target drivers share tags on one host. Then, the max tags for
each target can get:

	depth = max((bt->sb.depth + users - 1) / users, 4U);

But, each target driver may have their own capacity of tags and queue depth.
Does shared tag limit target device bandwidth?

No, if the 'target driver' means LUN, each LUN hasn't its independent
tags, maybe it has its own queue depth, which is often for maintaining
fairness among all active LUNs, not real queue depth.

You may see the patches[1] which try to bypass per-LUN queue depth for SSD.

[1] https://lore.kernel.org/linux-block/20191118103117.978-1-ming.lei@xxxxxxxxxx/

2) When add new target or remove device, it may need to freeze other device
to update hctx->flags of BLK_MQ_F_TAG_SHARED. That may hurt performance.

Add/removing device isn't a frequent event, so it shouldn't be a real
issue, or you have seen effect on real use case?

Thanks a lot for your detailed explanation.

We found that removing scsi device will delay a long time (such as 6 * 30s)
for waiting the other device in the same host to complete all IOs, where
some IO retry multiple times. If our driver allowed more times to retry,
removing device will wait longer. That is not expected.

I'd suggest you to figure out why IO timeout is triggered in your
device.

I agree with your suggestion. But we cannot prevent IO timeout and
retrying multiple times in device. Right? I think we should handle
gently even in that situation.

In fact, that is not problem before switching scsi blk-mq. All target
devices are independent when removing.

Is there IO timeout triggered before switching to scsi-mq?

I guess it shouldn't be one issue if io timeout isn't triggered >
However, there is still something we can improve, such as,
start concurrent queue freeze in blk_mq_update_tag_set_depth().

Before switching scsi-mq, timeout have been triggered as well.
But there is no delay when remove device. And it would not need to
wait IOs in the other device to complete. So, I also think we may
need to improve the freeze for scsi-mq.

Thanks,
Yufen