Hi Nicolas,
W dniu 31.07.2017 o 02:00, Nicholas A. Bellinger pisze:
Hi Lukasz,
On Thu, 2017-07-27 at 17:23 +0200, Łukasz Engel wrote:
Hello
I am experiencing problems using LIO FC target with vmware ESX (6.5).
Thisis the same problem as discussed in
https://www.spinics.net/lists/target-devel/msg15670.html
I've tried 4.11.x (up to 12) and 4.12.3(and also some kernels from 4.9
line).
[...]
Yes, this is most certainly the same bug the patch here addresses:
https://www.spinics.net/lists/target-devel/msg15759.html
Also, a smaller version of that just went out with your CC' on it here:
qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2)
https://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git/commit/?id=6bcbb3174caa5f1ccc894f8ae077631659d5a629
After applying both patches (in 6bcbb3174caa5f1ccc894f8ae077631659d5a629
manual adjusting in second chunk was required for 4.12.x - after-match
is slightly different) - there is no BUG, but hang in kworker instead,
with all I/Os from FC frozen:
Jul 31 16:27:14 teststorage [ 42.897291] qla2xxx
[0000:06:00.0]-500a:12: LOOP UP detected (4 Gbps).
Jul 31 16:27:16 teststorage [ 42.959068] qla2xxx
[0000:09:00.0]-ffff:11: qla24xx_do_nack_work create sess success
ffff881f80e29200
Jul 31 16:27:16 teststorage [ 44.144811] qla2xxx
[0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success
ffff881f80e28400
Jul 31 16:27:16 teststorage [ 44.230770] qla2xxx
[0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success
ffff881f8f575a00
Jul 31 16:27:16 teststorage [ 44.473650] qla2xxx
[0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success
ffff881f8f575600
Jul 31 16:27:16 teststorage [ 44.520821] qla2xxx
[0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success
ffff881f8f574400
Jul 31 16:27:16 teststorage [ 44.538183] qla2xxx
[0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success
ffff881f900f3400
Jul 31 16:27:16 teststorage [ 44.984052] qla2xxx
[0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success
ffff881f8f577a00
Jul 31 16:28:54 teststorage [ 45.042585] qla2xxx
[0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success
ffff881f80e28000
Jul 31 16:28:54 teststorage [ 142.431045] ABORT_TASK: Found referenced
qla2xxx task_tag: 1189328
Jul 31 16:28:54 teststorage [ 142.431059] ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1189328
Jul 31 16:28:57 teststorage [ 145.632571] ABORT_TASK: Found referenced
qla2xxx task_tag: 1177932
Jul 31 16:28:57 teststorage [ 145.897299] ABORT_TASK: Sending
TMR_FUNCTION_COMPLETE for ref_tag: 1177932
Jul 31 16:29:00 teststorage [ 148.388569] ABORT_TASK: Found referenced
qla2xxx task_tag: 1164160
Jul 31 16:32:00 teststorage [ 328.848344] se_cmd waiting CDB: 0x89 tag:
1164160 t_state: 5se_cmd_flags: 0x01c80109 transport_state: 0x00000033
Jul 31 16:32:41 teststorage [ 369.808484] INFO: task kworker/1:93:2802
blocked for more than 120 seconds.
Jul 31 16:32:41 teststorage [ 369.808493] Not tainted 4.12.3p1 #3
Jul 31 16:32:41 teststorage [ 369.808496] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 31 16:32:41 teststorage [ 369.808499] kworker/1:93 D 0
2802 2 0x00000080
Jul 31 16:32:41 teststorage [ 369.808515] Workqueue: events
qlt_free_session_done [qla2xxx]
Jul 31 16:32:41 teststorage [ 369.808518] Call Trace:
Jul 31 16:32:41 teststorage [ 369.808524] __schedule+0x285/0x840
Jul 31 16:32:41 teststorage [ 369.808528] schedule+0x36/0x80
Jul 31 16:32:41 teststorage [ 369.808531] schedule_timeout+0x242/0x2f0
Jul 31 16:32:41 teststorage [ 369.808537] ? ttwu_do_activate+0x77/0x80
Jul 31 16:32:41 teststorage [ 369.808540] wait_for_completion+0x111/0x170
Jul 31 16:32:41 teststorage [ 369.808544] ? wake_up_q+0x80/0x80
Jul 31 16:32:41 teststorage [ 369.808555]
target_wait_for_sess_cmds+0x49/0x170 [target_core_mod]
Jul 31 16:32:41 teststorage [ 369.808560]
tcm_qla2xxx_free_session+0x45/0x90 [tcm_qla2xxx]
Jul 31 16:32:41 teststorage [ 369.808572]
qlt_free_session_done+0x109/0x5a0 [qla2xxx]
Jul 31 16:32:41 teststorage [ 369.808579] ? qlt_unreg_sess+0xe7/0xf0
[qla2xxx]
Jul 31 16:32:41 teststorage [ 369.808583] ?
tcm_qla2xxx_put_sess+0x3c/0x50 [tcm_qla2xxx]
Jul 31 16:32:41 teststorage [ 369.808587] process_one_work+0x144/0x370
Jul 31 16:32:41 teststorage [ 369.808591] worker_thread+0x4d/0x3c0
Jul 31 16:32:41 teststorage [ 369.808594] kthread+0x109/0x140
Jul 31 16:32:41 teststorage [ 369.808597] ? rescuer_thread+0x360/0x360
Jul 31 16:32:41 teststorage [ 369.808600] ? kthread_park+0x60/0x60
Jul 31 16:32:41 teststorage [ 369.808720] ret_from_fork+0x25/0x30
Jul 31 16:32:41 teststorage [ 369.809016] NMI backtrace for cpu 3
Jul 31 16:32:41 teststorage [ 369.809446] CPU: 3 PID: 107 Comm:
khungtaskd Not tainted 4.12.3p1 #3
Jul 31 16:32:41 teststorage [ 369.809886] Hardware name: ASUS All
Series/X99-E WS/USB 3.1, BIOS 3402 11/14/2016
Jul 31 16:32:41 teststorage [ 369.810326] Call Trace:
Jul 31 16:32:41 teststorage [ 369.810764] dump_stack+0x63/0x8e
Jul 31 16:32:41 teststorage [ 369.811190] nmi_cpu_backtrace+0x9b/0xa0
Jul 31 16:32:41 teststorage [ 369.811613] ?
irq_force_complete_move+0xf0/0xf0
Jul 31 16:32:41 teststorage [ 369.812036]
nmi_trigger_cpumask_backtrace+0x99/0xd0
Jul 31 16:32:41 teststorage [ 369.812461]
arch_trigger_cpumask_backtrace+0x19/0x20
Jul 31 16:32:41 teststorage [ 369.812890] watchdog+0x2c7/0x340
Jul 31 16:32:41 teststorage [ 369.813321] kthread+0x109/0x140
Jul 31 16:32:41 teststorage [ 369.813748] ?
reset_hung_task_detector+0x20/0x20
Jul 31 16:32:41 teststorage [ 369.814171] ? kthread_park+0x60/0x60
Jul 31 16:32:41 teststorage [ 369.814595] ? kthread_park+0x60/0x60
Jul 31 16:32:41 teststorage [ 369.815013] ret_from_fork+0x25/0x30
Jul 31 16:32:41 teststorage [ 369.815428] Sending NMI from CPU 3 to
CPUs 0-2,4-11:
Jul 31 16:32:41 teststorage [ 369.815846] NMI backtrace for cpu 0
skipped: idling at pc 0xffffffff818b5ac6
Jul 31 16:32:41 teststorage [ 369.815946] NMI backtrace for cpu 1
skipped: idling at pc 0xffffffff818b5ac6
Jul 31 16:32:41 teststorage [ 369.816045] NMI backtrace for cpu 2
skipped: idling at pc 0xffffffff818b5ac6
Jul 31 16:32:41 teststorage [ 369.816146] NMI backtrace for cpu 4
skipped: idling at pc 0xffffffff818b5ac6
Jul 31 16:32:41 teststorage [ 369.816247] NMI backtrace for cpu 5
skipped: idling at pc 0xffffffff818b5ac6
Jul 31 16:32:41 teststorage [ 369.816352] NMI backtrace for cpu 6
skipped: idling at pc 0xffffffff818b5ac6
Jul 31 16:32:41 teststorage [ 369.816446] NMI backtrace for cpu 7
skipped: idling at pc 0xffffffff818b5ac6
Jul 31 16:32:41 teststorage [ 369.816547] NMI backtrace for cpu 8
skipped: idling at pc 0xffffffff818b5ac6
Jul 31 16:32:41 teststorage [ 369.816648] NMI backtrace for cpu 9
skipped: idling at pc 0xffffffff818b5ac6
Jul 31 16:32:41 teststorage [ 369.816747] NMI backtrace for cpu 10
skipped: idling at pc 0xffffffff818b5ac6
Jul 31 16:32:41 teststorage [ 369.816851] NMI backtrace for cpu 11
skipped: idling at pc 0xffffffff818b5ac6
Jul 31 16:32:41 teststorage [ 369.816855] Kernel panic - not syncing:
hung_task: blocked tasks
Jul 31 16:32:41 teststorage [ 369.816857] CPU: 3 PID: 107 Comm:
khungtaskd Not tainted 4.12.3p1 #3
Jul 31 16:32:41 teststorage [ 369.816857] Hardware name: ASUS All
Series/X99-E WS/USB 3.1, BIOS 3402 11/14/2016
Jul 31 16:32:41 teststorage [ 369.816857] Call Trace:
Jul 31 16:32:41 teststorage [ 369.816859] dump_stack+0x63/0x8e
Jul 31 16:32:41 teststorage [ 369.816861] panic+0xeb/0x239
Jul 31 16:32:41 teststorage [ 369.816863] watchdog+0x2d5/0x340
Jul 31 16:32:41 teststorage [ 369.816864] kthread+0x109/0x140
Jul 31 16:32:41 teststorage [ 369.816864] ?
reset_hung_task_detector+0x20/0x20
Jul 31 16:32:41 teststorage [ 369.816865] ? kthread_park+0x60/0x60
Jul 31 16:32:41 teststorage [ 369.816866] ? kthread_park+0x60/0x60
Jul 31 16:32:41 teststorage [ 369.816866] ret_from_fork+0x25/0x30
8<=====
Also, some more details on what your backend device storage behind
target-core looks like would be helpful.
Physical HBAs, device information, and any MD/RAID/LVM/etc on top..
I have adaptec 7805 HW RAID (6x SAS 600G 15k hdd in RAID0) - exporting 1
device to system (sdc),
LVM sits on this device and LV (vgtest/test) is used as LIO block
backstore and is exported to ESXs.
(for system I have separate 2 SATA HDDs in software raid1 + LVM)
[root@TESTSTORAGE ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 465,8G 0 disk
└─md126 9:126 0 465,8G 0 raid1
├─md126p3 259:2 0 314G 0 md
│ ├─vgsystem-swap 253:1 0 64G 0 lvm
│ ├─vgsystem-root 253:0 0 100G 0 lvm /
│ └─vgsystem-home 253:3 0 200G 0 lvm /home
├─md126p1 259:0 0 200M 0 md /boot/efi
├─md126p4 259:3 0 150,6G 0 md
│ └─vgsystem-root 253:0 0 100G 0 lvm /
└─md126p2 259:1 0 1G 0 md /boot
sdc 8:32 1 3,3T 0 disk
└─vgtest-test 253:2 0 3,3T 0 lvm
sda 8:0 0 465,8G 0 disk
└─md126 9:126 0 465,8G 0 raid1
├─md126p3 259:2 0 314G 0 md
│ ├─vgsystem-swap 253:1 0 64G 0 lvm
│ ├─vgsystem-root 253:0 0 100G 0 lvm /
│ └─vgsystem-home 253:3 0 200G 0 lvm /home
├─md126p1 259:0 0 200M 0 md /boot/efi
├─md126p4 259:3 0 150,6G 0 md
│ └─vgsystem-root 253:0 0 100G 0 lvm /
└─md126p2 259:1 0 1G 0 md /boot
8<=========
--
Lukasz Engel
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html