Re: [PATCH RFC v7 10/12] megaraid_sas: switch fusion adapters to MQ

John Garry <john.garry@xxxxxxxxxx> · Mon, 13 Jul 2020 09:42:40 +0100

On 13/07/2020 08:55, Kashyap Desai wrote:
ring normal operation. See initial check in
blk_mq_hctx_notify_offline():

static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node
*node) {
	if (!cpumask_test_cpu(cpu, hctx->cpumask) ||
	    !blk_mq_last_cpu_in_hctx(cpu, hctx))
		return 0;

Thanks John for this pointer. I missed this part and now I understood what
was happening in my testing.

JFYI, I always have this as a sanity check for my testing:

void irq_shutdown(struct irq_desc *desc)
{
+	pr_err("%s irq%d\n", __func__, desc->irq_data.irq);
+
	if (irqd_is_started(&desc->irq_data)) {
		desc->depth = 1;
		if (desc->irq_data.chip->irq_shutdown) {

There were more than one CPU mapped to one msix index in my earlier testing
and because of that I could see Interrupt migration happens on available CPU
from affinity mask. So my earlier testing was incorrect.

Now I am consistently able to reproduce issue - Best setup is have 1:1
mapping of CPU to MSIX vector mapping. I used 128 logical CPU and 128 msix
vectors and I noticed IO timeout without this RFC (without host_tagset).
I did not noticed IO timeout with RFC (with host_tagset.) I will update this
data in Driver's commit message.

ok, great. That's what we want. I'm not sure exactly what your test 
consists of, though.

Just for my understanding -
What if we have below code in blk_mq_hctx_notify_offline, CPU hotplug should
work for megaraid_sas driver without this RFC (without shared host tagset).
Right ?

If answer is yes, will there be any side effect of having below code in
block layer ?

Sorry, I'm sure sure what you're getting at with this change, below.

It seems that you're trying to drain hctx0 (which is your only hctx, as 
nr_hw_queues = 0 without this patchset) and set it inactive whenever any 
CPU is offlined. If so, that's not right.

static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node
  *node) {
  	if (hctx->queue->nr_hw_queues > 1
	    && (!cpumask_test_cpu(cpu, hctx->cpumask) ||
  	    !blk_mq_last_cpu_in_hctx(cpu, hctx)))
  		return 0;

I also noticed nr_hw_queues are now exposed in sysfs -

/sys/devices/pci0000:85/0000:85:00.0/0000:86:00.0/0000:87:04.0/0000:8b:00.0/0000:8c:00.0/0000:8d:00.0/host14/scsi_host/host14/nr_hw_queues:128
.

That's on my v8 wip branch, so I guess you're picking it up from there.

Thanks,
John