Re: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03-06-17 07:17, Nicholas A. Bellinger wrote:
Hi Pascal,

Adding the Cavium qla2xxx folks (Himanshu + Quinn) to CC'.

On Fri, 2017-06-02 at 14:10 +0200, Pascal de Bruijn wrote:
Hi,


We're setting up some storage servers where we're using lio/tcm_qla2xxx
to present some volumes via our FibreChannel fabrics to some VMware hosts.

We have two near identical servers, one connected to two single-switch
mini-fabrics, which has been operating fine. (This storage server has
two VMware hosts accessing the single LUN it presents).

The second storage server is connected to a larger multi-switch fabric
(with some zoning), which during testing has experienced a lockup, with
no clear cause visible on screen. We're still trying to reproduce. (This
storage server has nine VMware hosts accessing the single LUN it presents).

The lockup happened with 4.9.29, now after a minor update, from a new
boot, with a slightly updated kernel:


Linux liohost01 4.9.30#4 SMP Fri Jun 2 10:16:13 CEST 2017 x86_64 GNU/Linux


81:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channe to PCI Express HBA (rev 02)
81:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
82:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
82:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)


[    7.661080] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 8.07.00.38-k.
[    7.661786] qla2xxx [0000:81:00.0]-001d: : Found an ISP2532 irq 33 iobase 0xffffb874c6365000.
[    7.872535] scsi host2: qla2xxx
[    7.876854] qla2xxx [0000:81:00.0]-00fb:2: QLogic QLE2562 -  PCI-Express Dual Channel 8Gb Fibre Channel HBA.
[    7.876863] qla2xxx [0000:81:00.0]-00fc:2: ISP2532: PCIe (5.0GT/s x8) @ 0000:81:00.0 hdma+ host#=2 fw=8.03.00 (90d5).
[    7.877122] qla2xxx [0000:81:00.1]-001d: : Found an ISP2532 irq 114 iobase 0xffffb874c6425000.
[    8.083587] scsi host3: qla2xxx
[    8.087721] qla2xxx [0000:81:00.1]-00fb:3: QLogic QLE2562 - PCI-Express Dual Channel 8Gb Fibre Channel HBA.
[    8.087730] qla2xxx [0000:81:00.1]-00fc:3: ISP2532: PCIe (5.0GT/s x8) @ 0000:81:00.1 hdma+ host#=3 fw=8.03.00 (90d5).
[    8.087953] qla2xxx [0000:82:00.0]-001d: : Found an ISP2532 irq 35 iobase 0xffffb874c6435000.
[    8.299587] scsi host4: qla2xxx
[    8.303724] qla2xxx [0000:82:00.0]-00fb:4: QLogic QLE2562 - PCI-Express Dual Channel 8Gb Fibre Channel HBA.
[    8.303733] qla2xxx [0000:82:00.0]-00fc:4: ISP2532: PCIe (5.0GT/s x8) @ 0000:82:00.0 hdma+ host#=4 fw=8.03.00 (90d5).
[    8.303948] qla2xxx [0000:82:00.1]-001d: : Found an ISP2532 irq 119 iobase 0xffffb874c6445000.
[    8.516620] scsi host5: qla2xxx
[    8.520658] qla2xxx [0000:82:00.1]-00fb:5: QLogic QLE2562 - PCI-Express Dual Channel 8Gb Fibre Channel HBA.
[    8.520667] qla2xxx [0000:82:00.1]-00fc:5: ISP2532: PCIe (5.0GT/s x8) @ 0000:82:00.1 hdma+ host#=5 fw=8.03.00 (90d5).
[   30.511280] qla2xxx [0000:82:00.1]-00af:5: Performing ISP error recovery - ha=ffff9a1355130000.
[   31.716856] qla2xxx [0000:82:00.1]-500a:5: LOOP UP detected (4 Gbps).
[   35.656645] qla2xxx [0000:81:00.1]-00af:3: Performing ISP error recovery - ha=ffff9a11e6210000.
[   36.880156] qla2xxx [0000:81:00.1]-500a:3: LOOP UP detected (4 Gbps).
[   40.776863] qla2xxx [0000:82:00.0]-00af:4: Performing ISP error recovery - ha=ffff9a11e4c90000.
[   41.993433] qla2xxx [0000:82:00.0]-500a:4: LOOP UP detected (4 Gbps).
[   46.920062] qla2xxx [0000:81:00.0]-00af:2: Performing ISP error recovery - ha=ffff9a11e6ed0000.
[   48.146786] qla2xxx [0000:81:00.0]-500a:2: LOOP UP detected (4 Gbps).

...
...

Beyond that, Himanshu + Quinn can help with the next steps debugging
your issue.
So the serial capture seems to have paid off:

[389115.750114] qla2xxx/21:00:00:24:ff:3c:d0:95: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION.
[389538.824925] ABORT_TASK: Found referenced qla2xxx task_tag: 1190740
[389539.612590] ------------[ cut here ]------------
[389539.617330] kernel BUG at drivers/scsi/qla2xxx/qla_target.c:3319!
[389539.623543] invalid opcode: 0000 [#1] SMP
[389539.627667] Modules linked in: tcm_qla2xxx target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod configfs raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq md_mod ipmi_devintf ses enclosure sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp qla2xxx scsi_transport_fc mei_me igb mpt3sas mei ioatdma i2c_algo_bit dca wmi ipmi_si ipmi_msghandler [389539.666453] CPU: 0 PID: 28973 Comm: kworker/u33:1 Not tainted 4.9.30-ul5 #3 [389539.673531] Hardware name: Supermicro SYS-1028U-E1CRTP+/X10DRU-i+, BIOS 2.0b 08/09/2016
[389539.681682] Workqueue: tmr-iblock target_tmr_work [target_core_mod]
[389539.688084] task: ffffa1b3122369c0 task.stack: ffffb7dc20634000
[389539.694116] RIP: 0010:[<ffffffffc04acba8>] [<ffffffffc04acba8>] qlt_free_cmd+0x148/0x160 [qla2xxx]
[389539.703327] RSP: 0018:ffffb7dc20637d30  EFLAGS: 00010202
[389539.708756] RAX: 0000000000000000 RBX: ffffa1b3139998a8 RCX: ffffffffc04d4692 [389539.716026] RDX: 000000000000e074 RSI: ffffa1bb118615a0 RDI: 0000000000004000 [389539.723296] RBP: ffffb7dc20637d50 R08: ffffffffc04c17d0 R09: ffffa1b3139998a8 [389539.730567] R10: ffffdeb50134dc80 R11: ffffa1b33fff9f90 R12: ffffa1ac96522780 [389539.737831] R13: 0000000000000246 R14: 0000000000122b54 R15: ffffa1bb14e49d40 [389539.745103] FS: 0000000000000000(0000) GS:ffffa1b31fc00000(0000) knlGS:0000000000000000
[389539.753329] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[389539.759187] CR2: 00007f13ec063008 CR3: 00000001d4607000 CR4: 00000000001406f0
[389539.766458] Stack:
[389539.768582] ffffa1ac000003b1 ffffa1b3139998a8 ffffa1ac9601dd70 0000000000000246 [389539.776200] ffffb7dc20637d60 ffffffffc08cef2f ffffb7dc20637d88 ffffffffc0844417 [389539.783809] ffffa1b313999930 ffffa1b3139998a8 ffffa1b31a408000 ffffb7dc20637dd8
[389539.791428] Call Trace:
[389539.793987] [<ffffffffc08cef2f>] tcm_qla2xxx_release_cmd+0xf/0x20 [tcm_qla2xxx] [389539.801521] [<ffffffffc0844417>] target_put_sess_cmd+0xb7/0x120 [target_core_mod] [389539.809227] [<ffffffffc083fa8a>] core_tmr_abort_task+0x12a/0x180 [target_core_mod] [389539.817016] [<ffffffffc0841fc1>] target_tmr_work+0x111/0x120 [target_core_mod]
[389539.824462]  [<ffffffff8628fed1>] process_one_work+0x121/0x3c0
[389539.830412]  [<ffffffff862901b3>] worker_thread+0x43/0x4d0
[389539.836010]  [<ffffffff86290170>] ? process_one_work+0x3c0/0x3c0
[389539.842131]  [<ffffffff86290170>] ? process_one_work+0x3c0/0x3c0
[389539.848250]  [<ffffffff862959a4>] kthread+0xd4/0xf0
[389539.853242]  [<ffffffff862958d0>] ? kthread_worker_fn+0x100/0x100
[389539.859459]  [<ffffffff8668f922>] ret_from_fork+0x22/0x30
[389539.864976] Code: 29 1e c6 0f b6 83 8c 02 00 00 e9 3f ff ff ff be 05 0d 00 00 48 c7 c7 e8 de 4c c0 e8 53 f9 dc c5 48 83 c4 08 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 0b 48 8b bb 90 02 00 00 e8 58 a4 e8 c5 e9 2c ff ff ff
[389539.885246] RIP  [<ffffffffc04acba8>] qlt_free_cmd+0x148/0x160 [qla2xxx]
[389539.892083]  RSP <ffffb7dc20637d30>
[389539.896318] ---[ end trace 934bf5de2df85846 ]---
[389539.901127] general protection fault: 0000 [#2] SMP
[389539.906176] Modules linked in: tcm_qla2xxx target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod configfs raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq md_mod ipmi_devintf ses enclosure sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp qla2xxx scsi_transport_fc mei_me igb mpt3sas mei ioatdma i2c_algo_bit dca wmi ipmi_si ipmi_msghandler [389539.947341] CPU: 0 PID: 28973 Comm: kworker/u33:1 Tainted: G D 4.9.30-ul5 #3 [389539.955719] Hardware name: Supermicro SYS-1028U-E1CRTP+/X10DRU-i+, BIOS 2.0b 08/09/2016
[389539.963953] task: ffffa1b3122369c0 task.stack: ffffb7dc20634000
[389539.970055] RIP: 0010:[<ffffffff862ae826>] [<ffffffff862ae826>] __wake_up_common+0x26/0x80
[389539.978695] RSP: 0018:ffffb7dc20637e48  EFLAGS: 00010082
[389539.984181] RAX: 0000000000000282 RBX: ffffb7dc20637f10 RCX: 0000000000000000 [389539.991513] RDX: 3216b1a21fbe0843 RSI: 0000000000000003 RDI: ffffb7dc20637f10 [389539.998845] RBP: ffffb7dc20637e80 R08: 0000000000000000 R09: 0000000000000000 [389540.006179] R10: 0000000000000000 R11: 000000000000130e R12: ffffb7dc20637f18 [389540.013512] R13: 0000000000000282 R14: 0000000000000000 R15: 0000000000000003 [389540.020841] FS: 0000000000000000(0000) GS:ffffa1b31fc00000(0000) knlGS:0000000000000000
[389540.029127] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[389540.035048] CR2: 0000000000000028 CR3: 00000001d4607000 CR4: 00000000001406f0
[389540.046424] Stack:
[389540.048611] 000000011f813420 0000000000000000 ffffb7dc20637f10 ffffb7dc20637f08 [389540.056536] 0000000000000282 0000000000000000 0000000000000000 ffffb7dc20637e90 [389540.064451] ffffffff862ae8de ffffb7dc20637eb8 ffffffff862af1b2 0000000000000000
[389540.072363] Call Trace:
[389540.074990]  [<ffffffff862ae8de>] __wake_up_locked+0xe/0x10
[389540.080738]  [<ffffffff862af1b2>] complete+0x32/0x50
[389540.085882]  [<ffffffff86279c3e>] mm_release+0x9e/0x100
[389540.091287]  [<ffffffff8627ee53>] do_exit+0x473/0xa00
[389540.096518]  [<ffffffff86691027>] rewind_stack_do_exit+0x17/0x19
[389540.102699] Code: 5d c3 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 41 54 4c 8d 67 08 53 41 89 f7 48 83 ec 10 89 55 cc 48 8b 57 08 4c 89 45 d0 49 39 d4 <48> 8b 32 74 45 41 89 ce 48 8d 42 e8 4c 8d 6e e8 eb 03 49 89 d5
[389540.126809] RIP  [<ffffffff862ae826>] __wake_up_common+0x26/0x80
[389540.133075]  RSP <ffffb7dc20637e48>
[389540.136735] ---[ end trace 934bf5de2df85847 ]---
[389540.141529] Fixing recursive fault but reboot is needed!
[389540.147035] BUG: unable to handle kernel paging request at ffffffffffffffd8
[389540.154336] IP: [<ffffffff86295f5b>] kthread_data+0xb/0x20
[389540.160083] PGD 1d4608067 [389540.162770] PUD 1d460a067
PMD 0 [389540.166323]
[389540.168057] Oops: 0000 [#3] SMP
[389540.171375] Modules linked in: tcm_qla2xxx target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod configfs raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq md_mod ipmi_devintf ses enclosure sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp qla2xxx scsi_transport_fc mei_me igb mpt3sas mei ioatdma i2c_algo_bit dca wmi ipmi_si ipmi_msghandler [389540.212501] CPU: 0 PID: 28973 Comm: kworker/u33:1 Tainted: G D 4.9.30-ul5 #3 [389540.220870] Hardware name: Supermicro SYS-1028U-E1CRTP+/X10DRU-i+, BIOS 2.0b 08/09/2016
[389540.229086] task: ffffa1b3122369c0 task.stack: ffffb7dc20634000
[389540.235179] RIP: 0010:[<ffffffff86295f5b>] [<ffffffff86295f5b>] kthread_data+0xb/0x20
[389540.243371] RSP: 0018:ffffb7dc20637e70  EFLAGS: 00010002
[389540.248857] RAX: 0000000000000000 RBX: 0000000000015000 RCX: ffffa1b31fc158c8 [389540.256190] RDX: ffffa1b3150f6a68 RSI: ffffa1b312236a40 RDI: ffffa1b3122369c0 [389540.263520] RBP: ffffb7dc20637e70 R08: ffffffff86a7bcc0 R09: 0000000000000000 [389540.270844] R10: 000000000000128b R11: 00000a8500000000 R12: ffffa1b31fc15000 [389540.278175] R13: ffffa1b3122369c0 R14: ffffa1b312236e28 R15: 0000000000000000 [389540.285500] FS: 0000000000000000(0000) GS:ffffa1b31fc00000(0000) knlGS:0000000000000000
[389540.293786] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[389540.299704] CR2: 0000000000000028 CR3: 00000001d4607000 CR4: 00000000001406f0
[389540.307028] Stack:
[389540.309213] ffffb7dc20637e80 ffffffff86290e89 ffffb7dc20637ec0 ffffffff8668bc26 [389540.317134] ffffb7dc20637ea0 ffffa1b3122369c0 000000000000000b 0000000000000000 [389540.325047] 0000000000000000 0000000000000003 ffffb7dc20637ed8 ffffffff8668be81
[389540.332959] Call Trace:
[389540.335582]  [<ffffffff86290e89>] wq_worker_sleeping+0x9/0x80
[389540.341511]  [<ffffffff8668bc26>] __schedule+0x2e6/0x510
[389540.347002]  [<ffffffff8668be81>] schedule+0x31/0x80
[389540.352142]  [<ffffffff8627f20e>] do_exit+0x82e/0xa00
[389540.357367]  [<ffffffff86691027>] rewind_stack_do_exit+0x17/0x19
[389540.363545] Code: 25 c0 c2 00 00 48 8b 80 08 04 00 00 5d 48 8b 40 c8 48 d1 e8 83 e0 01 c3 0f 1f 84 00 00 00 00 00 48 8b 87 08 04 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00
[389540.387647] RIP  [<ffffffff86295f5b>] kthread_data+0xb/0x20
[389540.393481]  RSP <ffffb7dc20637e70>
[389540.397138] CR2: ffffffffffffffd8
[389540.400622] ---[ end trace 934bf5de2df85848 ]---
[389540.405416] Fixing recursive fault but reboot is needed!
[389600.004856] ABORT_TASK: Found referenced qla2xxx task_tag: 1224484
[389661.093059] qla2xxx/21:00:00:24:ff:54:a4:b7: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION.

Please let me know if you need anything else from my end.

Regards,
Pascal de Bruijn


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux