Re: [PATCH V4 0/5] blk-mq: improvement on handling IO during CPU hotplug

John Garry <john.garry@xxxxxxxxxx> · Tue, 22 Oct 2019 12:19:17 +0100

On 22/10/2019 01:16, Ming Lei wrote:
On Mon, Oct 21, 2019 at 03:02:56PM +0100, John Garry wrote:
On 21/10/2019 13:53, Ming Lei wrote:
On Mon, Oct 21, 2019 at 12:49:53PM +0100, John Garry wrote:

Yes, we share tags among all queues, but we generate the tag - known as IPTT
- in the LLDD now, as we can no longer use the request tag (as it is not
unique per all queues):

https://github.com/hisilicon/kernel-dev/commit/087b95af374be6965583c1673032fb33bc8127e8#diff-f5d8fff19bc539a7387af5230d4e5771R188

As I said, the branch is messy and I did have to fix 087b95af374.

Firstly this way may waste lots of memory, especially the queue depth is
big, such as, hisilicon V3's queue depth is 4096.

Secondly, you have to deal with queue busy efficiently and correctly,
for example, your real hw tags(IPTT) can be used up easily, and how
will you handle these dispatched request?

I have not seen scenario of exhausted IPTT. And IPTT count is same as SCSI
host.can_queue, so SCSI midlayer should ensure that this does not occur.

Hi Ming,

That check isn't correct, and each hw queue should have allowed
.can_queue in-flight requests.

There always seems to be some confusion or disagreement on this topic.

I work according to the comment in scsi_host.h:

"Note: it is assumed that each hardware queue has a queue depth of
  can_queue. In other words, the total queue depth per host
  is nr_hw_queues * can_queue."

So I set Scsi_host.can_queue = HISI_SAS_MAX_COMMANDS (=4096)

I believe all current drivers set .can_queue as single hw queue's depth.
If you set .can_queue as HISI_SAS_MAX_COMMANDS which is HBA's queue
depth, the hisilicon sas driver will HISI_SAS_MAX_COMMANDS * nr_hw_queues
in-flight requests.

Yeah, but the SCSI host should still limit max IOs over all queues to 
.can_queue:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/scsi/scsi_mid_low_api.txt#n1083

Finally, you have to evaluate the performance effect, this is highly
related with how to deal with out-of-IPTT.

Some figures from our previous testing:

Managed interrupt without exposing multiple queues: 3M IOPs
Managed interrupt with exposing multiple queues: 2.6M IOPs

Then you see the performance regression.

Let's discuss this when I send the patches, so we don't get sidetracked on
this blk-mq improvement topic.

OK, what I meant is to use correct driver to test the patches, otherwise
it might be hard to investigate.

Of course. I'm working on this now, and it looks like it will turn out 
complicated... you'll see.

BTW, I reran the test and never say the new WARN trigger (while SCSI 
timeouts did occur).

Thanks again,
John

Thanks,
Ming

.