Hi,
We're setting up some storage servers where we're using lio/tcm_qla2xxx
to present some volumes via our FibreChannel fabrics to some VMware hosts.
We have two near identical servers, one connected to two single-switch
mini-fabrics, which has been operating fine. (This storage server has
two VMware hosts accessing the single LUN it presents).
The second storage server is connected to a larger multi-switch fabric
(with some zoning), which during testing has experienced a lockup, with
no clear cause visible on screen. We're still trying to reproduce. (This
storage server has nine VMware hosts accessing the single LUN it presents).
The lockup happened with 4.9.29, now after a minor update, from a new
boot, with a slightly updated kernel:
Linux liohost01 4.9.30#4 SMP Fri Jun 2 10:16:13 CEST 2017 x86_64 GNU/Linux
81:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to
PCI Express HBA (rev 02)
81:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to
PCI Express HBA (rev 02)
82:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to
PCI Express HBA (rev 02)
82:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to
PCI Express HBA (rev 02)
[ 7.661080] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA
Driver: 8.07.00.38-k.
[ 7.661786] qla2xxx [0000:81:00.0]-001d: : Found an ISP2532 irq 33
iobase 0xffffb874c6365000.
[ 7.872535] scsi host2: qla2xxx
[ 7.876854] qla2xxx [0000:81:00.0]-00fb:2: QLogic QLE2562 -
PCI-Express Dual Channel 8Gb Fibre Channel HBA.
[ 7.876863] qla2xxx [0000:81:00.0]-00fc:2: ISP2532: PCIe (5.0GT/s x8)
@ 0000:81:00.0 hdma+ host#=2 fw=8.03.00 (90d5).
[ 7.877122] qla2xxx [0000:81:00.1]-001d: : Found an ISP2532 irq 114
iobase 0xffffb874c6425000.
[ 8.083587] scsi host3: qla2xxx
[ 8.087721] qla2xxx [0000:81:00.1]-00fb:3: QLogic QLE2562 -
PCI-Express Dual Channel 8Gb Fibre Channel HBA.
[ 8.087730] qla2xxx [0000:81:00.1]-00fc:3: ISP2532: PCIe (5.0GT/s x8)
@ 0000:81:00.1 hdma+ host#=3 fw=8.03.00 (90d5).
[ 8.087953] qla2xxx [0000:82:00.0]-001d: : Found an ISP2532 irq 35
iobase 0xffffb874c6435000.
[ 8.299587] scsi host4: qla2xxx
[ 8.303724] qla2xxx [0000:82:00.0]-00fb:4: QLogic QLE2562 -
PCI-Express Dual Channel 8Gb Fibre Channel HBA.
[ 8.303733] qla2xxx [0000:82:00.0]-00fc:4: ISP2532: PCIe (5.0GT/s x8)
@ 0000:82:00.0 hdma+ host#=4 fw=8.03.00 (90d5).
[ 8.303948] qla2xxx [0000:82:00.1]-001d: : Found an ISP2532 irq 119
iobase 0xffffb874c6445000.
[ 8.516620] scsi host5: qla2xxx
[ 8.520658] qla2xxx [0000:82:00.1]-00fb:5: QLogic QLE2562 -
PCI-Express Dual Channel 8Gb Fibre Channel HBA.
[ 8.520667] qla2xxx [0000:82:00.1]-00fc:5: ISP2532: PCIe (5.0GT/s x8)
@ 0000:82:00.1 hdma+ host#=5 fw=8.03.00 (90d5).
[ 30.511280] qla2xxx [0000:82:00.1]-00af:5: Performing ISP error
recovery - ha=ffff9a1355130000.
[ 31.716856] qla2xxx [0000:82:00.1]-500a:5: LOOP UP detected (4 Gbps).
[ 35.656645] qla2xxx [0000:81:00.1]-00af:3: Performing ISP error
recovery - ha=ffff9a11e6210000.
[ 36.880156] qla2xxx [0000:81:00.1]-500a:3: LOOP UP detected (4 Gbps).
[ 40.776863] qla2xxx [0000:82:00.0]-00af:4: Performing ISP error
recovery - ha=ffff9a11e4c90000.
[ 41.993433] qla2xxx [0000:82:00.0]-500a:4: LOOP UP detected (4 Gbps).
[ 46.920062] qla2xxx [0000:81:00.0]-00af:2: Performing ISP error
recovery - ha=ffff9a11e6ed0000.
[ 48.146786] qla2xxx [0000:81:00.0]-500a:2: LOOP UP detected (4 Gbps).
We see some kernel messages on both storage servers:
[ 557.363627] qla2xxx/21:00:00:24:ff:54:a4:b6: Unsupported SCSI Opcode
0x85, sending CHECK_CONDITION.
You've already pointed out elsewhere on the list that this is not an
real issue.
However, on the storage server that experienced the lockup, we do see
some kernel messages, that aren't present on the storage server that
didn't lock up:
[ 739.250099] ABORT_TASK: Found referenced qla2xxx task_tag: 1184452
[ 739.250101] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag:
1184452
I've seen this about 80 times over the past three hours.
I'd appreciate any pointers you could give me as to the nature of the
above kernel messages, and whether they warrant further investigation.
Regards,
Pascal de Bruijn
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html