Re: [PATCH] blk-mq: don't fail driver tag allocation because of inactive hctx

John Garry <john.garry@xxxxxxxxxx> · Thu, 4 Jun 2020 15:11:39 +0100

It isn't related with request->tag, what I meant is that you use
out-of-tree patch to enable multiple hw queue on hisi_sas, you have
to make the queue mapping correct, that said the exact queue mapping
from blk-mq's mapping has to be used, which is built from managed
interrupt affinity.

Please collect the following log:

1) ./dump-io-irq-affinity $PCI_ID_OF_HBA
http://people.redhat.com/minlei/tests/tools/dump-io-irq-affinity

I had to hack this a bit for SAS HBA:

kernel version:
Linux ubuntu 5.7.0-next-20200603-16498-gbfbfcda762d5 #405 SMP PREEMPT 
Thu Jun 4 14:19:49 BST 2020 aarch64 aarch64 aarch64 GNU/Linux
-e 	irq 137, cpu list 16-19, effective list 16
-e 	irq 138, cpu list 20-23, effective list 20
-e 	irq 139, cpu list 24-27, effective list 24
-e 	irq 140, cpu list 28-31, effective list 28
-e 	irq 141, cpu list 32-35, effective list 32
-e 	irq 142, cpu list 36-39, effective list 36
-e 	irq 143, cpu list 40-43, effective list 40
-e 	irq 144, cpu list 44-47, effective list 44
-e 	irq 145, cpu list 48-51, effective list 48
-e 	irq 146, cpu list 52-55, effective list 52
-e 	irq 147, cpu list 56-59, effective list 56
-e 	irq 148, cpu list 60-63, effective list 60
-e 	irq 149, cpu list 0-3, effective list 0
-e 	irq 150, cpu list 4-7, effective list 4
-e 	irq 151, cpu list 8-11, effective list 8
-e 	irq 152, cpu list 12-15, effective list 12

2) ./dump-qmap /dev/sdN
http://people.redhat.com/minlei/tests/tools/dump-qmap

queue mapping for /dev/sda
	hctx0: default 16 17 18 19
	hctx1: default 20 21 22 23
	hctx2: default 24 25 26 27
	hctx3: default 28 29 30 31
	hctx4: default 32 33 34 35
	hctx5: default 36 37 38 39
	hctx6: default 40 41 42 43
	hctx7: default 44 45 46 47
	hctx8: default 48 49 50 51
	hctx9: default 52 53 54 55
	hctx10: default 56 57 58 59
	hctx11: default 60 61 62 63
	hctx12: default 0 1 2 3
	hctx13: default 4 5 6 7
	hctx14: default 8 9 10 11
	hctx15: default 12 13 14 15
queue mapping for /dev/sdb
	hctx0: default 16 17 18 19
	hctx1: default 20 21 22 23
	hctx2: default 24 25 26 27
	hctx3: default 28 29 30 31
	hctx4: default 32 33 34 35
	hctx5: default 36 37 38 39
	hctx6: default 40 41 42 43
	hctx7: default 44 45 46 47
	hctx8: default 48 49 50 51
	hctx9: default 52 53 54 55
	hctx10: default 56 57 58 59
	hctx11: default 60 61 62 63
	hctx12: default 0 1 2 3
	hctx13: default 4 5 6 7
	hctx14: default 8 9 10 11
	hctx15: default 12 13 14 15
queue mapping for /dev/sdc
	hctx0: default 16 17 18 19
	hctx1: default 20 21 22 23
	hctx2: default 24 25 26 27
	hctx3: default 28 29 30 31
	hctx4: default 32 33 34 35
	hctx5: default 36 37 38 39
	hctx6: default 40 41 42 43
	hctx7: default 44 45 46 47
	hctx8: default 48 49 50 51
	hctx9: default 52 53 54 55
	hctx10: default 56 57 58 59
	hctx11: default 60 61 62 63
	hctx12: default 0 1 2 3
	hctx13: default 4 5 6 7
	hctx14: default 8 9 10 11
	hctx15: default 12 13 14 15
queue mapping for /dev/sdd
	hctx0: default 16 17 18 19
	hctx1: default 20 21 22 23
	hctx2: default 24 25 26 27
	hctx3: default 28 29 30 31
	hctx4: default 32 33 34 35
	hctx5: default 36 37 38 39
	hctx6: default 40 41 42 43
	hctx7: default 44 45 46 47
	hctx8: default 48 49 50 51
	hctx9: default 52 53 54 55
	hctx10: default 56 57 58 59
	hctx11: default 60 61 62 63
	hctx12: default 0 1 2 3
	hctx13: default 4 5 6 7
	hctx14: default 8 9 10 11
	hctx15: default 12 13 14 15
queue mapping for /dev/sde
	hctx0: default 16 17 18 19
	hctx1: default 20 21 22 23
	hctx2: default 24 25 26 27
	hctx3: default 28 29 30 31
	hctx4: default 32 33 34 35
	hctx5: default 36 37 38 39
	hctx6: default 40 41 42 43
	hctx7: default 44 45 46 47
	hctx8: default 48 49 50 51
	hctx9: default 52 53 54 55
	hctx10: default 56 57 58 59
	hctx11: default 60 61 62 63
	hctx12: default 0 1 2 3
	hctx13: default 4 5 6 7
	hctx14: default 8 9 10 11
	hctx15: default 12 13 14 15
queue mapping for /dev/sdf
	hctx0: default 16 17 18 19
	hctx1: default 20 21 22 23
	hctx2: default 24 25 26 27
	hctx3: default 28 29 30 31
	hctx4: default 32 33 34 35
	hctx5: default 36 37 38 39
	hctx6: default 40 41 42 43
	hctx7: default 44 45 46 47
	hctx8: default 48 49 50 51
	hctx9: default 52 53 54 55
	hctx10: default 56 57 58 59
	hctx11: default 60 61 62 63
	hctx12: default 0 1 2 3
	hctx13: default 4 5 6 7
	hctx14: default 8 9 10 11
	hctx15: default 12 13 14 15
queue mapping for /dev/sdg
	hctx0: default 16 17 18 19
	hctx1: default 20 21 22 23
	hctx2: default 24 25 26 27
	hctx3: default 28 29 30 31
	hctx4: default 32 33 34 35
	hctx5: default 36 37 38 39
	hctx6: default 40 41 42 43
	hctx7: default 44 45 46 47
	hctx8: default 48 49 50 51
	hctx9: default 52 53 54 55
	hctx10: default 56 57 58 59
	hctx11: default 60 61 62 63
	hctx12: default 0 1 2 3
	hctx13: default 4 5 6 7
	hctx14: default 8 9 10 11
	hctx15: default 12 13 14 15
queue mapping for /dev/sdh
	hctx0: default 16 17 18 19
	hctx1: default 20 21 22 23
	hctx2: default 24 25 26 27
	hctx3: default 28 29 30 31
	hctx4: default 32 33 34 35
	hctx5: default 36 37 38 39
	hctx6: default 40 41 42 43
	hctx7: default 44 45 46 47
	hctx8: default 48 49 50 51
	hctx9: default 52 53 54 55
	hctx10: default 56 57 58 59
	hctx11: default 60 61 62 63
	hctx12: default 0 1 2 3
	hctx13: default 4 5 6 7
	hctx14: default 8 9 10 11
	hctx15: default 12 13 14 15

  and suggest you to double check
hisi_sas's queue mapping which has to be exactly same with blk-mq's
mapping.

scheduler=none is ok, so I am skeptical of a problem there.

If yes, can you collect debugfs log after the timeout is triggered?

Same limitation as before - once SCSI timeout happens, SCSI error handling
kicks in and the shost no longer accepts commands, and, since that same
shost provides rootfs, becomes unresponsive. But I can try.

Just wondering why not install two disks in your test machine, :-)

The shost becomes unresponsive for all disks. So I could try nfs, but I'm
not a fan :)

Then it will take you extra effort in collecting log, and NFS root
should have been quite easy to setup, :-)

Should be ...

>> No, but the LLDD does not use request->tag - it generates its own.
>
> Except for wrong queue mapping,  another reason is that the generated
> tag may not
> be unique. Either of two may cause such timeout issue when the managed
> interrupt is
> active.
>

Right, but the tag should be unique - it needs to be in the LLDD.

Anyway, I'll continue to debug.

BTW, I'm using linux-next 0306 as baseline. I don't like using next, but 
Linus' master branch yesterday was crashing while booting for me. I need 
to check that again for where master is now.

Thanks