ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


We're setting up some storage servers where we're using lio/tcm_qla2xxx to present some volumes via our FibreChannel fabrics to some VMware hosts.

We have two near identical servers, one connected to two single-switch mini-fabrics, which has been operating fine. (This storage server has two VMware hosts accessing the single LUN it presents).

The second storage server is connected to a larger multi-switch fabric (with some zoning), which during testing has experienced a lockup, with no clear cause visible on screen. We're still trying to reproduce. (This storage server has nine VMware hosts accessing the single LUN it presents).

The lockup happened with 4.9.29, now after a minor update, from a new boot, with a slightly updated kernel:


Linux liohost01 4.9.30#4 SMP Fri Jun 2 10:16:13 CEST 2017 x86_64 GNU/Linux


81:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02) 81:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02) 82:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02) 82:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)


[ 7.661080] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 8.07.00.38-k. [ 7.661786] qla2xxx [0000:81:00.0]-001d: : Found an ISP2532 irq 33 iobase 0xffffb874c6365000.
[    7.872535] scsi host2: qla2xxx
[ 7.876854] qla2xxx [0000:81:00.0]-00fb:2: QLogic QLE2562 - PCI-Express Dual Channel 8Gb Fibre Channel HBA. [ 7.876863] qla2xxx [0000:81:00.0]-00fc:2: ISP2532: PCIe (5.0GT/s x8) @ 0000:81:00.0 hdma+ host#=2 fw=8.03.00 (90d5). [ 7.877122] qla2xxx [0000:81:00.1]-001d: : Found an ISP2532 irq 114 iobase 0xffffb874c6425000.
[    8.083587] scsi host3: qla2xxx
[ 8.087721] qla2xxx [0000:81:00.1]-00fb:3: QLogic QLE2562 - PCI-Express Dual Channel 8Gb Fibre Channel HBA. [ 8.087730] qla2xxx [0000:81:00.1]-00fc:3: ISP2532: PCIe (5.0GT/s x8) @ 0000:81:00.1 hdma+ host#=3 fw=8.03.00 (90d5). [ 8.087953] qla2xxx [0000:82:00.0]-001d: : Found an ISP2532 irq 35 iobase 0xffffb874c6435000.
[    8.299587] scsi host4: qla2xxx
[ 8.303724] qla2xxx [0000:82:00.0]-00fb:4: QLogic QLE2562 - PCI-Express Dual Channel 8Gb Fibre Channel HBA. [ 8.303733] qla2xxx [0000:82:00.0]-00fc:4: ISP2532: PCIe (5.0GT/s x8) @ 0000:82:00.0 hdma+ host#=4 fw=8.03.00 (90d5). [ 8.303948] qla2xxx [0000:82:00.1]-001d: : Found an ISP2532 irq 119 iobase 0xffffb874c6445000.
[    8.516620] scsi host5: qla2xxx
[ 8.520658] qla2xxx [0000:82:00.1]-00fb:5: QLogic QLE2562 - PCI-Express Dual Channel 8Gb Fibre Channel HBA. [ 8.520667] qla2xxx [0000:82:00.1]-00fc:5: ISP2532: PCIe (5.0GT/s x8) @ 0000:82:00.1 hdma+ host#=5 fw=8.03.00 (90d5). [ 30.511280] qla2xxx [0000:82:00.1]-00af:5: Performing ISP error recovery - ha=ffff9a1355130000.
[   31.716856] qla2xxx [0000:82:00.1]-500a:5: LOOP UP detected (4 Gbps).
[ 35.656645] qla2xxx [0000:81:00.1]-00af:3: Performing ISP error recovery - ha=ffff9a11e6210000.
[   36.880156] qla2xxx [0000:81:00.1]-500a:3: LOOP UP detected (4 Gbps).
[ 40.776863] qla2xxx [0000:82:00.0]-00af:4: Performing ISP error recovery - ha=ffff9a11e4c90000.
[   41.993433] qla2xxx [0000:82:00.0]-500a:4: LOOP UP detected (4 Gbps).
[ 46.920062] qla2xxx [0000:81:00.0]-00af:2: Performing ISP error recovery - ha=ffff9a11e6ed0000.
[   48.146786] qla2xxx [0000:81:00.0]-500a:2: LOOP UP detected (4 Gbps).


We see some kernel messages on both storage servers:

[ 557.363627] qla2xxx/21:00:00:24:ff:54:a4:b6: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION.

You've already pointed out elsewhere on the list that this is not an real issue.


However, on the storage server that experienced the lockup, we do see some kernel messages, that aren't present on the storage server that didn't lock up:

[  739.250099] ABORT_TASK: Found referenced qla2xxx task_tag: 1184452
[ 739.250101] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1184452

I've seen this about 80 times over the past three hours.

I'd appreciate any pointers you could give me as to the nature of the above kernel messages, and whether they warrant further investigation.


Regards,
Pascal de Bruijn


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux