Hi Lukasz, (Adding Pascal CC', as he's been seeing something similar with MD backend storage) Thanks alot for the follow up. Comments below. On Mon, 2017-07-31 at 17:15 +0200, Łukasz Engel wrote: > Hi Nicolas, > > W dniu 31.07.2017 o 02:00, Nicholas A. Bellinger pisze: > > Hi Lukasz, > > > > On Thu, 2017-07-27 at 17:23 +0200, Łukasz Engel wrote: > >> Hello > >> > >> I am experiencing problems using LIO FC target with vmware ESX (6.5). > >> > >> Thisis the same problem as discussed in > >> https://www.spinics.net/lists/target-devel/msg15670.html > >> I've tried 4.11.x (up to 12) and 4.12.3(and also some kernels from 4.9 > >> line). > [...] > >> Yes, this is most certainly the same bug the patch here addresses: > >> > >> https://www.spinics.net/lists/target-devel/msg15759.html > >> > >> Also, a smaller version of that just went out with your CC' on it here: > >> > >> qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2) > >> https://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git/commit/?id=6bcbb3174caa5f1ccc894f8ae077631659d5a629 > After applying both patches (in 6bcbb3174caa5f1ccc894f8ae077631659d5a629 > manual adjusting in second chunk was required for 4.12.x - after-match > is slightly different) - there is no BUG, but hang in kworker instead, > with all I/Os from FC frozen: > > Jul 31 16:27:14 teststorage [ 42.897291] qla2xxx [0000:06:00.0]-500a:12: LOOP UP detected (4 Gbps). > Jul 31 16:27:16 teststorage [ 42.959068] qla2xxx [0000:09:00.0]-ffff:11: qla24xx_do_nack_work create sess success ffff881f80e29200 > Jul 31 16:27:16 teststorage [ 44.144811] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f80e28400 > Jul 31 16:27:16 teststorage [ 44.230770] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f8f575a00 > Jul 31 16:27:16 teststorage [ 44.473650] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f8f575600 > Jul 31 16:27:16 teststorage [ 44.520821] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f8f574400 > Jul 31 16:27:16 teststorage [ 44.538183] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f900f3400 > Jul 31 16:27:16 teststorage [ 44.984052] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f8f577a00 > Jul 31 16:28:54 teststorage [ 45.042585] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f80e28000 > Jul 31 16:28:54 teststorage [ 142.431045] ABORT_TASK: Found referenced qla2xxx task_tag: 1189328 > Jul 31 16:28:54 teststorage [ 142.431059] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1189328 > Jul 31 16:28:57 teststorage [ 145.632571] ABORT_TASK: Found referenced qla2xxx task_tag: 1177932 > Jul 31 16:28:57 teststorage [ 145.897299] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1177932 > Jul 31 16:29:00 teststorage [ 148.388569] ABORT_TASK: Found referenced qla2xxx task_tag: 1164160 > Jul 31 16:32:00 teststorage [ 328.848344] se_cmd waiting CDB: 0x89 tag: 1164160 t_state: 5 se_cmd_flags: 0x01c80109 transport_state: 0x00000033 Perfect. This is very helpful to understand state of the I/O triggering ESX host ABORTs, and that blocks waiting for completion from backend storage. Here's what the above debug translates to: CDB: 0x89 = COMPARE_AND_WRITE (eg: VAAI AtomicTestandSet) t_state: 5 = TRANSPORT_PROCESSING se_cmd_flags: 0x01c80109 = SCF_SUPPORTED_SAM_OPCODE | SCF_SCSI_DATA_CDB | SCF_SE_LUN_CMD | SCF_COMPARE_AND_WRITE | SCF_ACK_KREF | SCF_USE_CPUID | SCF_TASK_ATTR_SET transport_state: 0x00000033 = CMD_T_ABORTED | CMD_T_ACTIVE | CMD_T_SENT | CMD_T_STOP Note COMPARE_AND_WRITE wrt to backend I/O consists of two phases. First phase is a READ and comparison against the first part of the received buffer. If the comparison matches, then the second phase sends a WRITE with the second part of the received buffer to complete the update. So according to the above output, the first phase of COMPARE_AND_WRITE dispatched the READ into backend storage. The READ did not complete, and subsequently ESX sent an ABORT_TASK for it (tag: 1164160). The ABORT_TASK successfully found the I/O still outstanding to backend storage, and blocks waiting for it to complete.. The debug printing 'se_cmd waiting ..' message shows every 180 seconds, so it's clear I/O completion from backend storage is not happening back to IBLOCK -> target-core for an extended amount of time. Btw, I've been simulating this same scenario by delaying backend storage I/O completion, and thus far once the first phase READ of COMPARE_AND_WRITE eventually returns back to IBLOCK -> target-core, then ABORT_TASK is able to complete and everything else continues to function as expected. That said, thus far the evidence is pointing to a backend device issue, where that first READ phase of the COMPARE_AND_WRITE is (never..?) completed back to IBLOCK -> target-core. More below on how to verify this. > Jul 31 16:32:41 teststorage [ 369.808484] INFO: task kworker/1:93:2802 blocked for more than 120 seconds. > Jul 31 16:32:41 teststorage [ 369.808493] Not tainted 4.12.3p1 #3 > Jul 31 16:32:41 teststorage [ 369.808496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jul 31 16:32:41 teststorage [ 369.808499] kworker/1:93 D 0 2802 2 0x00000080 > Jul 31 16:32:41 teststorage [ 369.808515] Workqueue: events qlt_free_session_done [qla2xxx] > Jul 31 16:32:41 teststorage [ 369.808518] Call Trace: > Jul 31 16:32:41 teststorage [ 369.808524] __schedule+0x285/0x840 > Jul 31 16:32:41 teststorage [ 369.808528] schedule+0x36/0x80 > Jul 31 16:32:41 teststorage [ 369.808531] schedule_timeout+0x242/0x2f0 > Jul 31 16:32:41 teststorage [ 369.808537] ? ttwu_do_activate+0x77/0x80 > Jul 31 16:32:41 teststorage [ 369.808540] wait_for_completion+0x111/0x170 > Jul 31 16:32:41 teststorage [ 369.808544] ? wake_up_q+0x80/0x80 > Jul 31 16:32:41 teststorage [ 369.808555] target_wait_for_sess_cmds+0x49/0x170 [target_core_mod] > Jul 31 16:32:41 teststorage [ 369.808560] tcm_qla2xxx_free_session+0x45/0x90 [tcm_qla2xxx] > Jul 31 16:32:41 teststorage [ 369.808572] qlt_free_session_done+0x109/0x5a0 [qla2xxx] > Jul 31 16:32:41 teststorage [ 369.808579] ? qlt_unreg_sess+0xe7/0xf0 [qla2xxx] > Jul 31 16:32:41 teststorage [ 369.808583] ? tcm_qla2xxx_put_sess+0x3c/0x50 [tcm_qla2xxx] > Jul 31 16:32:41 teststorage [ 369.808587] process_one_work+0x144/0x370 > Jul 31 16:32:41 teststorage [ 369.808591] worker_thread+0x4d/0x3c0 > Jul 31 16:32:41 teststorage [ 369.808594] kthread+0x109/0x140 > Jul 31 16:32:41 teststorage [ 369.808597] ? rescuer_thread+0x360/0x360 > Jul 31 16:32:41 teststorage [ 369.808600] ? kthread_park+0x60/0x60 > Jul 31 16:32:41 teststorage [ 369.808720] ret_from_fork+0x25/0x30 > Jul 31 16:32:41 teststorage [ 369.809016] NMI backtrace for cpu 3 > Jul 31 16:32:41 teststorage [ 369.809446] CPU: 3 PID: 107 Comm: khungtaskd Not tainted 4.12.3p1 #3 > Jul 31 16:32:41 teststorage [ 369.809886] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3402 11/14/2016 > Jul 31 16:32:41 teststorage [ 369.810326] Call Trace: > Jul 31 16:32:41 teststorage [ 369.810764] dump_stack+0x63/0x8e > Jul 31 16:32:41 teststorage [ 369.811190] nmi_cpu_backtrace+0x9b/0xa0 > Jul 31 16:32:41 teststorage [ 369.811613] ? irq_force_complete_move+0xf0/0xf0 > Jul 31 16:32:41 teststorage [ 369.812036] nmi_trigger_cpumask_backtrace+0x99/0xd0 > Jul 31 16:32:41 teststorage [ 369.812461] arch_trigger_cpumask_backtrace+0x19/0x20 > Jul 31 16:32:41 teststorage [ 369.812890] watchdog+0x2c7/0x340 > Jul 31 16:32:41 teststorage [ 369.813321] kthread+0x109/0x140 > Jul 31 16:32:41 teststorage [ 369.813748] ? reset_hung_task_detector+0x20/0x20 > Jul 31 16:32:41 teststorage [ 369.814171] ? kthread_park+0x60/0x60 > Jul 31 16:32:41 teststorage [ 369.814595] ? kthread_park+0x60/0x60 > Jul 31 16:32:41 teststorage [ 369.815013] ret_from_fork+0x25/0x30 > Jul 31 16:32:41 teststorage [ 369.815428] Sending NMI from CPU 3 to CPUs 0-2,4-11: > Jul 31 16:32:41 teststorage [ 369.815846] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffff818b5ac6 > Jul 31 16:32:41 teststorage [ 369.815946] NMI backtrace for cpu 1 skipped: idling at pc 0xffffffff818b5ac6 > Jul 31 16:32:41 teststorage [ 369.816045] NMI backtrace for cpu 2 skipped: idling at pc 0xffffffff818b5ac6 > Jul 31 16:32:41 teststorage [ 369.816146] NMI backtrace for cpu 4 skipped: idling at pc 0xffffffff818b5ac6 > Jul 31 16:32:41 teststorage [ 369.816247] NMI backtrace for cpu 5 skipped: idling at pc 0xffffffff818b5ac6 > Jul 31 16:32:41 teststorage [ 369.816352] NMI backtrace for cpu 6 skipped: idling at pc 0xffffffff818b5ac6 > Jul 31 16:32:41 teststorage [ 369.816446] NMI backtrace for cpu 7 skipped: idling at pc 0xffffffff818b5ac6 > Jul 31 16:32:41 teststorage [ 369.816547] NMI backtrace for cpu 8 skipped: idling at pc 0xffffffff818b5ac6 > Jul 31 16:32:41 teststorage [ 369.816648] NMI backtrace for cpu 9 skipped: idling at pc 0xffffffff818b5ac6 > Jul 31 16:32:41 teststorage [ 369.816747] NMI backtrace for cpu 10 skipped: idling at pc 0xffffffff818b5ac6 > Jul 31 16:32:41 teststorage [ 369.816851] NMI backtrace for cpu 11 skipped: idling at pc 0xffffffff818b5ac6 > Jul 31 16:32:41 teststorage [ 369.816855] Kernel panic - not syncing: hung_task: blocked tasks > Jul 31 16:32:41 teststorage [ 369.816857] CPU: 3 PID: 107 Comm: khungtaskd Not tainted 4.12.3p1 #3 > Jul 31 16:32:41 teststorage [ 369.816857] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3402 11/14/2016 > Jul 31 16:32:41 teststorage [ 369.816857] Call Trace: > Jul 31 16:32:41 teststorage [ 369.816859] dump_stack+0x63/0x8e > Jul 31 16:32:41 teststorage [ 369.816861] panic+0xeb/0x239 > Jul 31 16:32:41 teststorage [ 369.816863] watchdog+0x2d5/0x340 > Jul 31 16:32:41 teststorage [ 369.816864] kthread+0x109/0x140 > Jul 31 16:32:41 teststorage [ 369.816864] ? reset_hung_task_detector+0x20/0x20 > Jul 31 16:32:41 teststorage [ 369.816865] ? kthread_park+0x60/0x60 > Jul 31 16:32:41 teststorage [ 369.816866] ? kthread_park+0x60/0x60 > Jul 31 16:32:41 teststorage [ 369.816866] ret_from_fork+0x25/0x30 > Btw, it looks like this system is running with: CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=1 which triggers a crash as soon as any hung task is detected. I'd recommend to set to this '0' for now, to monitor what's going on while hung tasks are occurring. Also, this can be set at boot time via hung_task_panic=0. > 8<===== > > > Also, some more details on what your backend device storage behind > > target-core looks like would be helpful. > > > > Physical HBAs, device information, and any MD/RAID/LVM/etc on top.. > > > I have adaptec 7805 HW RAID (6x SAS 600G 15k hdd in RAID0) - exporting 1 > device to system (sdc), > LVM sits on this device and LV (vgtest/test) is used as LIO block > backstore and is exported to ESXs. > (for system I have separate 2 SATA HDDs in software raid1 + LVM) > > [root@TESTSTORAGE ~]# lsblk > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sdb 8:16 0 465,8G 0 disk > └─md126 9:126 0 465,8G 0 raid1 > ├─md126p3 259:2 0 314G 0 md > │ ├─vgsystem-swap 253:1 0 64G 0 lvm > │ ├─vgsystem-root 253:0 0 100G 0 lvm / > │ └─vgsystem-home 253:3 0 200G 0 lvm /home > ├─md126p1 259:0 0 200M 0 md /boot/efi > ├─md126p4 259:3 0 150,6G 0 md > │ └─vgsystem-root 253:0 0 100G 0 lvm / > └─md126p2 259:1 0 1G 0 md /boot > sdc 8:32 1 3,3T 0 disk > └─vgtest-test 253:2 0 3,3T 0 lvm > sda 8:0 0 465,8G 0 disk > └─md126 9:126 0 465,8G 0 raid1 > ├─md126p3 259:2 0 314G 0 md > │ ├─vgsystem-swap 253:1 0 64G 0 lvm > │ ├─vgsystem-root 253:0 0 100G 0 lvm / > │ └─vgsystem-home 253:3 0 200G 0 lvm /home > ├─md126p1 259:0 0 200M 0 md /boot/efi > ├─md126p4 259:3 0 150,6G 0 md > │ └─vgsystem-root 253:0 0 100G 0 lvm / > └─md126p2 259:1 0 1G 0 md /boot > > 8<========= > OK, the vgtest-test LVM is backed by a single SCSI device (sdc), there is a simple test to see if the SCSI device is holding onto I/Os. Once the hung task triggers, check out: chead /sys/block/sdc/inflight This will show how many I/Os are outstanding to the underlying SCSI device. IIRC, md and lvm devices don't use 'inflight' counters, so this is only useful for verifying the number of I/Os outstanding to the underlying SCSI device. So if it's possible, I'd recommend verifying with the raw SCSI device with BLOCK export as well, to isolate if it's something specific to LVM that's causing I/Os not to be completed. Thanks again for your detailed bug-report. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html