Re: kernel BUG at drivers/scsi/qla2xxx/qla_target.c:3643! (or :3319)

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Sun, 06 Aug 2017 15:28:48 -0700

Hi Lukasz,

(Adding Pascal CC', as he's been seeing something similar with MD
backend storage)

Thanks alot for the follow up.  Comments below.

On Mon, 2017-07-31 at 17:15 +0200, Łukasz Engel wrote:
> Hi Nicolas,
> 
> W dniu 31.07.2017 o 02:00, Nicholas A. Bellinger pisze:
> > Hi Lukasz,
> >
> > On Thu, 2017-07-27 at 17:23 +0200, Łukasz Engel wrote:
> >> Hello
> >>
> >> I am experiencing problems using LIO FC target with vmware ESX (6.5).
> >>
> >> Thisis the same problem as discussed in
> >> https://www.spinics.net/lists/target-devel/msg15670.html
> >> I've tried 4.11.x (up to 12) and 4.12.3(and also some kernels from 4.9
> >> line).
> [...]
> >> Yes, this is most certainly the same bug the patch here addresses:
> >>
> >> https://www.spinics.net/lists/target-devel/msg15759.html
> >>
> >> Also, a smaller version of that just went out with your CC' on it here:
> >>
> >> qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2)
> >> https://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git/commit/?id=6bcbb3174caa5f1ccc894f8ae077631659d5a629
> After applying both patches (in 6bcbb3174caa5f1ccc894f8ae077631659d5a629 
> manual adjusting in second chunk was required for 4.12.x - after-match 
> is slightly different) - there is no BUG, but hang in kworker instead, 
> with all I/Os from FC frozen:
> 
> Jul 31 16:27:14 teststorage [   42.897291] qla2xxx [0000:06:00.0]-500a:12: LOOP UP detected (4 Gbps).
> Jul 31 16:27:16 teststorage [   42.959068] qla2xxx [0000:09:00.0]-ffff:11: qla24xx_do_nack_work create sess success ffff881f80e29200
> Jul 31 16:27:16 teststorage [   44.144811] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f80e28400
> Jul 31 16:27:16 teststorage [   44.230770] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f8f575a00
> Jul 31 16:27:16 teststorage [   44.473650] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f8f575600
> Jul 31 16:27:16 teststorage [   44.520821] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f8f574400
> Jul 31 16:27:16 teststorage [   44.538183] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f900f3400
> Jul 31 16:27:16 teststorage [   44.984052] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f8f577a00
> Jul 31 16:28:54 teststorage [   45.042585] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f80e28000
> Jul 31 16:28:54 teststorage [  142.431045] ABORT_TASK: Found referenced qla2xxx task_tag: 1189328
> Jul 31 16:28:54 teststorage [  142.431059] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1189328
> Jul 31 16:28:57 teststorage [  145.632571] ABORT_TASK: Found referenced qla2xxx task_tag: 1177932
> Jul 31 16:28:57 teststorage [  145.897299] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1177932
> Jul 31 16:29:00 teststorage [  148.388569] ABORT_TASK: Found referenced qla2xxx task_tag: 1164160
> Jul 31 16:32:00 teststorage [  328.848344] se_cmd waiting CDB: 0x89 tag: 1164160 t_state: 5 se_cmd_flags: 0x01c80109 transport_state: 0x00000033

Perfect.  This is very helpful to understand state of the I/O triggering
ESX host ABORTs, and that blocks waiting for completion from backend
storage.

Here's what the above debug translates to:

CDB: 0x89 = COMPARE_AND_WRITE (eg: VAAI AtomicTestandSet)
t_state: 5 = TRANSPORT_PROCESSING

se_cmd_flags: 0x01c80109 =

   SCF_SUPPORTED_SAM_OPCODE | SCF_SCSI_DATA_CDB | SCF_SE_LUN_CMD |
   SCF_COMPARE_AND_WRITE | SCF_ACK_KREF | SCF_USE_CPUID | SCF_TASK_ATTR_SET

transport_state: 0x00000033 =

   CMD_T_ABORTED | CMD_T_ACTIVE | CMD_T_SENT | CMD_T_STOP

Note COMPARE_AND_WRITE wrt to backend I/O consists of two phases.  First
phase is a READ and comparison against the first part of the received
buffer.  If the comparison matches, then the second phase sends a WRITE
with the second part of the received buffer to complete the update.

So according to the above output, the first phase of COMPARE_AND_WRITE
dispatched the READ into backend storage.  The READ did not complete,
and subsequently ESX sent an ABORT_TASK for it (tag: 1164160).  The
ABORT_TASK successfully found the I/O still outstanding to backend
storage, and blocks waiting for it to complete..

The debug printing 'se_cmd waiting ..' message shows every 180 seconds,
so it's clear I/O completion from backend storage is not happening back
to IBLOCK -> target-core for an extended amount of time.

Btw, I've been simulating this same scenario by delaying backend storage
I/O completion, and thus far once the first phase READ of
COMPARE_AND_WRITE eventually returns back to IBLOCK -> target-core, then
ABORT_TASK is able to complete and everything else continues to function
as expected.

That said, thus far the evidence is pointing to a backend device issue,
where that first READ phase of the COMPARE_AND_WRITE is (never..?)
completed back to IBLOCK -> target-core.

More below on how to verify this.

> Jul 31 16:32:41 teststorage [  369.808484] INFO: task kworker/1:93:2802  blocked for more than 120 seconds.
> Jul 31 16:32:41 teststorage [  369.808493]       Not tainted 4.12.3p1 #3
> Jul 31 16:32:41 teststorage [  369.808496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jul 31 16:32:41 teststorage [  369.808499] kworker/1:93    D    0 2802      2 0x00000080
> Jul 31 16:32:41 teststorage [  369.808515] Workqueue: events qlt_free_session_done [qla2xxx]
> Jul 31 16:32:41 teststorage [  369.808518] Call Trace:
> Jul 31 16:32:41 teststorage [  369.808524]  __schedule+0x285/0x840
> Jul 31 16:32:41 teststorage [  369.808528]  schedule+0x36/0x80
> Jul 31 16:32:41 teststorage [  369.808531] schedule_timeout+0x242/0x2f0
> Jul 31 16:32:41 teststorage [  369.808537]  ? ttwu_do_activate+0x77/0x80
> Jul 31 16:32:41 teststorage [  369.808540] wait_for_completion+0x111/0x170
> Jul 31 16:32:41 teststorage [  369.808544]  ? wake_up_q+0x80/0x80
> Jul 31 16:32:41 teststorage [  369.808555] target_wait_for_sess_cmds+0x49/0x170 [target_core_mod]
> Jul 31 16:32:41 teststorage [  369.808560] tcm_qla2xxx_free_session+0x45/0x90 [tcm_qla2xxx]
> Jul 31 16:32:41 teststorage [  369.808572] qlt_free_session_done+0x109/0x5a0 [qla2xxx]
> Jul 31 16:32:41 teststorage [  369.808579]  ? qlt_unreg_sess+0xe7/0xf0 [qla2xxx]
> Jul 31 16:32:41 teststorage [  369.808583]  ? tcm_qla2xxx_put_sess+0x3c/0x50 [tcm_qla2xxx]
> Jul 31 16:32:41 teststorage [  369.808587] process_one_work+0x144/0x370
> Jul 31 16:32:41 teststorage [  369.808591]  worker_thread+0x4d/0x3c0
> Jul 31 16:32:41 teststorage [  369.808594]  kthread+0x109/0x140
> Jul 31 16:32:41 teststorage [  369.808597]  ? rescuer_thread+0x360/0x360
> Jul 31 16:32:41 teststorage [  369.808600]  ? kthread_park+0x60/0x60
> Jul 31 16:32:41 teststorage [  369.808720]  ret_from_fork+0x25/0x30
> Jul 31 16:32:41 teststorage [  369.809016] NMI backtrace for cpu 3
> Jul 31 16:32:41 teststorage [  369.809446] CPU: 3 PID: 107 Comm: khungtaskd Not tainted 4.12.3p1 #3
> Jul 31 16:32:41 teststorage [  369.809886] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3402 11/14/2016
> Jul 31 16:32:41 teststorage [  369.810326] Call Trace:
> Jul 31 16:32:41 teststorage [  369.810764]  dump_stack+0x63/0x8e
> Jul 31 16:32:41 teststorage [  369.811190] nmi_cpu_backtrace+0x9b/0xa0
> Jul 31 16:32:41 teststorage [  369.811613]  ? irq_force_complete_move+0xf0/0xf0
> Jul 31 16:32:41 teststorage [  369.812036] nmi_trigger_cpumask_backtrace+0x99/0xd0
> Jul 31 16:32:41 teststorage [  369.812461] arch_trigger_cpumask_backtrace+0x19/0x20
> Jul 31 16:32:41 teststorage [  369.812890]  watchdog+0x2c7/0x340
> Jul 31 16:32:41 teststorage [  369.813321]  kthread+0x109/0x140
> Jul 31 16:32:41 teststorage [  369.813748]  ? reset_hung_task_detector+0x20/0x20
> Jul 31 16:32:41 teststorage [  369.814171]  ? kthread_park+0x60/0x60
> Jul 31 16:32:41 teststorage [  369.814595]  ? kthread_park+0x60/0x60
> Jul 31 16:32:41 teststorage [  369.815013]  ret_from_fork+0x25/0x30
> Jul 31 16:32:41 teststorage [  369.815428] Sending NMI from CPU 3 to CPUs 0-2,4-11:
> Jul 31 16:32:41 teststorage [  369.815846] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffff818b5ac6
> Jul 31 16:32:41 teststorage [  369.815946] NMI backtrace for cpu 1 skipped: idling at pc 0xffffffff818b5ac6
> Jul 31 16:32:41 teststorage [  369.816045] NMI backtrace for cpu 2 skipped: idling at pc 0xffffffff818b5ac6
> Jul 31 16:32:41 teststorage [  369.816146] NMI backtrace for cpu 4 skipped: idling at pc 0xffffffff818b5ac6
> Jul 31 16:32:41 teststorage [  369.816247] NMI backtrace for cpu 5 skipped: idling at pc 0xffffffff818b5ac6
> Jul 31 16:32:41 teststorage [  369.816352] NMI backtrace for cpu 6 skipped: idling at pc 0xffffffff818b5ac6
> Jul 31 16:32:41 teststorage [  369.816446] NMI backtrace for cpu 7 skipped: idling at pc 0xffffffff818b5ac6
> Jul 31 16:32:41 teststorage [  369.816547] NMI backtrace for cpu 8 skipped: idling at pc 0xffffffff818b5ac6
> Jul 31 16:32:41 teststorage [  369.816648] NMI backtrace for cpu 9 skipped: idling at pc 0xffffffff818b5ac6
> Jul 31 16:32:41 teststorage [  369.816747] NMI backtrace for cpu 10 skipped: idling at pc 0xffffffff818b5ac6
> Jul 31 16:32:41 teststorage [  369.816851] NMI backtrace for cpu 11 skipped: idling at pc 0xffffffff818b5ac6
> Jul 31 16:32:41 teststorage [  369.816855] Kernel panic - not syncing: hung_task: blocked tasks
> Jul 31 16:32:41 teststorage [  369.816857] CPU: 3 PID: 107 Comm: khungtaskd Not tainted 4.12.3p1 #3
> Jul 31 16:32:41 teststorage [  369.816857] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3402 11/14/2016
> Jul 31 16:32:41 teststorage [  369.816857] Call Trace:
> Jul 31 16:32:41 teststorage [  369.816859]  dump_stack+0x63/0x8e
> Jul 31 16:32:41 teststorage [  369.816861]  panic+0xeb/0x239
> Jul 31 16:32:41 teststorage [  369.816863]  watchdog+0x2d5/0x340
> Jul 31 16:32:41 teststorage [  369.816864]  kthread+0x109/0x140
> Jul 31 16:32:41 teststorage [  369.816864]  ? reset_hung_task_detector+0x20/0x20
> Jul 31 16:32:41 teststorage [  369.816865]  ? kthread_park+0x60/0x60
> Jul 31 16:32:41 teststorage [  369.816866]  ? kthread_park+0x60/0x60
> Jul 31 16:32:41 teststorage [  369.816866]  ret_from_fork+0x25/0x30
> 

Btw, it looks like this system is running with:

   CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=1

which triggers a crash as soon as any hung task is detected.

I'd recommend to set to this '0' for now, to monitor what's going on
while hung tasks are occurring.

Also, this can be set at boot time via hung_task_panic=0.

> 8<=====
> 
> > Also, some more details on what your backend device storage behind
> > target-core looks like would be helpful.
> >
> > Physical HBAs, device information, and any MD/RAID/LVM/etc on top..
> >
> I have adaptec 7805 HW RAID (6x SAS 600G 15k hdd in RAID0) - exporting 1 
> device to system (sdc),
> LVM sits on this device and LV (vgtest/test) is used as LIO block 
> backstore and is exported to ESXs.
> (for system I have separate 2 SATA HDDs in software raid1 + LVM)
> 
> [root@TESTSTORAGE ~]# lsblk
> NAME                MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
> sdb                   8:16   0 465,8G  0 disk
> └─md126               9:126  0 465,8G  0 raid1
>    ├─md126p3         259:2    0   314G  0 md
>    │ ├─vgsystem-swap 253:1    0    64G  0 lvm
>    │ ├─vgsystem-root 253:0    0   100G  0 lvm   /
>    │ └─vgsystem-home 253:3    0   200G  0 lvm   /home
>    ├─md126p1         259:0    0   200M  0 md    /boot/efi
>    ├─md126p4         259:3    0 150,6G  0 md
>    │ └─vgsystem-root 253:0    0   100G  0 lvm   /
>    └─md126p2         259:1    0     1G  0 md    /boot
> sdc                   8:32   1   3,3T  0 disk
> └─vgtest-test       253:2    0   3,3T  0 lvm
> sda                   8:0    0 465,8G  0 disk
> └─md126               9:126  0 465,8G  0 raid1
>    ├─md126p3         259:2    0   314G  0 md
>    │ ├─vgsystem-swap 253:1    0    64G  0 lvm
>    │ ├─vgsystem-root 253:0    0   100G  0 lvm   /
>    │ └─vgsystem-home 253:3    0   200G  0 lvm   /home
>    ├─md126p1         259:0    0   200M  0 md    /boot/efi
>    ├─md126p4         259:3    0 150,6G  0 md
>    │ └─vgsystem-root 253:0    0   100G  0 lvm   /
>    └─md126p2         259:1    0     1G  0 md    /boot
> 
> 8<=========
> 

OK, the vgtest-test LVM is backed by a single SCSI device (sdc), there
is a simple test to see if the SCSI device is holding onto I/Os.

Once the hung task triggers, check out:

   chead /sys/block/sdc/inflight

This will show how many I/Os are outstanding to the underlying SCSI
device.

IIRC, md and lvm devices don't use 'inflight' counters, so this is only
useful for verifying the number of I/Os outstanding to the underlying
SCSI device.  So if it's possible, I'd recommend verifying with the raw
SCSI device with BLOCK export as well, to isolate if it's something
specific to LVM that's causing I/Os not to be completed.

Thanks again for your detailed bug-report.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html