On 13/07/2020 08:55, Kashyap Desai wrote:
ring normal operation. See initial check in
blk_mq_hctx_notify_offline():
static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node
*node) {
if (!cpumask_test_cpu(cpu, hctx->cpumask) ||
!blk_mq_last_cpu_in_hctx(cpu, hctx))
return 0;
Thanks John for this pointer. I missed this part and now I understood what
was happening in my testing.
JFYI, I always have this as a sanity check for my testing:
void irq_shutdown(struct irq_desc *desc)
{
+ pr_err("%s irq%d\n", __func__, desc->irq_data.irq);
+
if (irqd_is_started(&desc->irq_data)) {
desc->depth = 1;
if (desc->irq_data.chip->irq_shutdown) {
There were more than one CPU mapped to one msix index in my earlier testing
and because of that I could see Interrupt migration happens on available CPU
from affinity mask. So my earlier testing was incorrect.
Now I am consistently able to reproduce issue - Best setup is have 1:1
mapping of CPU to MSIX vector mapping. I used 128 logical CPU and 128 msix
vectors and I noticed IO timeout without this RFC (without host_tagset).
I did not noticed IO timeout with RFC (with host_tagset.) I will update this
data in Driver's commit message.
ok, great. That's what we want. I'm not sure exactly what your test
consists of, though.
Just for my understanding -
What if we have below code in blk_mq_hctx_notify_offline, CPU hotplug should
work for megaraid_sas driver without this RFC (without shared host tagset).
Right ?
If answer is yes, will there be any side effect of having below code in
block layer ?
Sorry, I'm sure sure what you're getting at with this change, below.
It seems that you're trying to drain hctx0 (which is your only hctx, as
nr_hw_queues = 0 without this patchset) and set it inactive whenever any
CPU is offlined. If so, that's not right.
static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node
*node) {
if (hctx->queue->nr_hw_queues > 1
&& (!cpumask_test_cpu(cpu, hctx->cpumask) ||
!blk_mq_last_cpu_in_hctx(cpu, hctx)))
return 0;
I also noticed nr_hw_queues are now exposed in sysfs -
/sys/devices/pci0000:85/0000:85:00.0/0000:86:00.0/0000:87:04.0/0000:8b:00.0/0000:8c:00.0/0000:8d:00.0/host14/scsi_host/host14/nr_hw_queues:128
.
That's on my v8 wip branch, so I guess you're picking it up from there.
Thanks,
John