Hi Nicolas,
W dniu 08.08.2017 o 23:00, Łukasz Engel pisze:
Hi Nicolas,
[...]
Tomorrow I plan to retest using different sas controller which I have
on shelf (LSI).
We will see if it's replicable in that configuration, but as Pascal
got this for quite different setup than mine I expect to got similar
results.
Personally for me (If I properly understand your description) it looks
like in some situation i/o completion notification for read part of
TAS operation "got lost" somewhere in block/target layer. (And then
timeout for operation kicks in esx and it sends ABORT_TASK for this
operation)
(BTW - I saw some non-zero values also for dm-* inflight counters
during i/o activity, so it looks lvm currently (4.12) provide
inflight stats )
I will let you know tomorrow abount my findings
With LSI controller (HW RAID0 built on same 6 HDDs ), without LVM - same
results:
Aug 9 15:49:32 TESTSTORAGE kernel: Detected MISCOMPARE for addr:
ffff881f901e0000 buf: ffff881f83c5c800
Aug 9 15:49:32 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check
condition and sense
Aug 9 15:49:32 TESTSTORAGE kernel: Detected MISCOMPARE for addr:
ffff881f83e57000 buf: ffff881f83c5c800
Aug 9 15:49:32 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check
condition and sense
Aug 9 15:49:32 TESTSTORAGE kernel: Detected MISCOMPARE for addr:
ffff881f7ee51000 buf: ffff881f83c5c800
Aug 9 15:49:32 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check
condition and sense
Aug 9 15:49:32 TESTSTORAGE kernel: Detected MISCOMPARE for addr:
ffff881f81745000 buf: ffff881f83c5c800
Aug 9 15:49:32 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check
condition and sense
Aug 9 15:49:33 TESTSTORAGE kernel: Detected MISCOMPARE for addr:
ffff881f7ef99000 buf: ffff881f83c5c800
Aug 9 15:49:33 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check
condition and sense
Aug 9 15:49:33 TESTSTORAGE kernel: Detected MISCOMPARE for addr:
ffff881f7e56a000 buf: ffff881f83c5c800
Aug 9 15:49:33 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check
condition and sense
Aug 9 15:49:33 TESTSTORAGE kernel: Detected MISCOMPARE for addr:
ffff881f7edf1000 buf: ffff881f83c5c800
Aug 9 15:49:33 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check
condition and sense
Aug 9 15:49:33 TESTSTORAGE kernel: Detected MISCOMPARE for addr:
ffff881f8faaa000 buf: ffff881f83c5c800
Aug 9 15:49:33 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check
condition and sense
Aug 9 15:49:37 TESTSTORAGE kernel: Detected MISCOMPARE for addr:
ffff881f815d1000 buf: ffff881f83c5fa00
Aug 9 15:49:37 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check
condition and sense
Aug 9 15:49:37 TESTSTORAGE kernel: Detected MISCOMPARE for addr:
ffff881f8146e000 buf: ffff881f83c5f200
Aug 9 15:49:37 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check
condition and sense
Aug 9 15:49:37 TESTSTORAGE kernel: ABORT_TASK: Found referenced qla2xxx
task_tag: 1138156
Aug 9 15:49:37 TESTSTORAGE kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1138156
Aug 9 15:49:37 TESTSTORAGE kernel: ABORT_TASK: Found referenced qla2xxx
task_tag: 1138200
Aug 9 15:49:37 TESTSTORAGE kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1138200
Aug 9 15:49:50 TESTSTORAGE kernel: ABORT_TASK: Found referenced qla2xxx
task_tag: 1193992
Aug 9 15:49:52 TESTSTORAGE kernel: ABORT_TASK: Sending
TMR_FUNCTION_COMPLETE for ref_tag: 1193992
Aug 9 15:49:52 TESTSTORAGE kernel: ABORT_TASK: Found referenced qla2xxx
task_tag: 1194036
Aug 9 15:49:57 TESTSTORAGE kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x00000001
Aug 9 15:49:57 TESTSTORAGE kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x00000002
[...] <probably periodic LUN scan issued by esx>
Aug 9 15:49:57 TESTSTORAGE kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x000000ff
Aug 9 15:52:09 TESTSTORAGE kernel: INFO: task kworker/u24:0:3001
blocked for more than 120 seconds.
Aug 9 15:52:09 TESTSTORAGE kernel: Not tainted 4.12.3p1 #3
Aug 9 15:52:09 TESTSTORAGE kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 15:52:09 TESTSTORAGE kernel: kworker/u24:0 D 0 3001 2
0x00000080
Aug 9 15:52:09 TESTSTORAGE kernel: Workqueue: tmr-iblock
target_tmr_work [target_core_mod]
Aug 9 15:52:09 TESTSTORAGE kernel: Call Trace:
Aug 9 15:52:09 TESTSTORAGE kernel: __schedule+0x285/0x840
Aug 9 15:52:09 TESTSTORAGE kernel: schedule+0x36/0x80
Aug 9 15:52:09 TESTSTORAGE kernel: schedule_timeout+0x173/0x2f0
Aug 9 15:52:09 TESTSTORAGE kernel: ? call_timer_fn+0x140/0x140
Aug 9 15:52:09 TESTSTORAGE kernel: ? get_work_pool+0x2d/0x40
Aug 9 15:52:09 TESTSTORAGE kernel: wait_for_completion_timeout+0x12c/0x190
Aug 9 15:52:09 TESTSTORAGE kernel: ? wake_up_q+0x80/0x80
Aug 9 15:52:09 TESTSTORAGE kernel:
__transport_wait_for_tasks+0xd9/0x1b0 [target_core_mod]
Aug 9 15:52:09 TESTSTORAGE kernel: transport_wait_for_tasks+0x53/0x90
[target_core_mod]
Aug 9 15:52:09 TESTSTORAGE kernel: core_tmr_abort_task+0x10e/0x190
[target_core_mod]
Aug 9 15:52:09 TESTSTORAGE kernel: target_tmr_work+0x111/0x120
[target_core_mod]
Aug 9 15:52:09 TESTSTORAGE kernel: process_one_work+0x144/0x370
Aug 9 15:52:09 TESTSTORAGE kernel: worker_thread+0x4d/0x3c0
Aug 9 15:52:09 TESTSTORAGE kernel: kthread+0x109/0x140
Aug 9 15:52:09 TESTSTORAGE kernel: ? rescuer_thread+0x360/0x360
Aug 9 15:52:09 TESTSTORAGE kernel: ? kthread_park+0x60/0x60
Aug 9 15:52:09 TESTSTORAGE kernel: ret_from_fork+0x25/0x30
Aug 9 15:52:54 TESTSTORAGE kernel: se_cmd waiting CDB: 0x89 tag:
1194036 t_state: 5se_cmd_flags: 0x01c80109 transport_state: 0x00000033
At this point inflight counters for sdc are "0 0".
Later there are more "hung task" notifications, but probably they are
result of the thirst one...
Aug 9 15:54:12 TESTSTORAGE kernel: INFO: task kworker/1:75:3172 blocked
for more than 120 seconds.
Aug 9 15:54:12 TESTSTORAGE kernel: Not tainted 4.12.3p1 #3
Aug 9 15:54:12 TESTSTORAGE kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 15:54:12 TESTSTORAGE kernel: kworker/1:75 D 0 3172 2
0x00000080
Aug 9 15:54:12 TESTSTORAGE kernel: Workqueue: events
qlt_free_session_done [qla2xxx]
Aug 9 15:54:12 TESTSTORAGE kernel: Call Trace:
Aug 9 15:54:12 TESTSTORAGE kernel: __schedule+0x285/0x840
Aug 9 15:54:12 TESTSTORAGE kernel: schedule+0x36/0x80
Aug 9 15:54:12 TESTSTORAGE kernel: schedule_timeout+0x242/0x2f0
Aug 9 15:54:12 TESTSTORAGE kernel: ? ttwu_do_activate+0x77/0x80
Aug 9 15:54:12 TESTSTORAGE kernel: wait_for_completion+0x111/0x170
Aug 9 15:54:12 TESTSTORAGE kernel: ? wake_up_q+0x80/0x80
Aug 9 15:54:12 TESTSTORAGE kernel: target_wait_for_sess_cmds+0x49/0x170
[target_core_mod]
Aug 9 15:54:12 TESTSTORAGE kernel: tcm_qla2xxx_free_session+0x45/0x90
[tcm_qla2xxx]
Aug 9 15:54:12 TESTSTORAGE kernel: qlt_free_session_done+0x109/0x5a0
[qla2xxx]
Aug 9 15:54:12 TESTSTORAGE kernel: ? qlt_unreg_sess+0xe7/0xf0 [qla2xxx]
Aug 9 15:54:12 TESTSTORAGE kernel: ? tcm_qla2xxx_put_sess+0x3c/0x50
[tcm_qla2xxx]
Aug 9 15:54:12 TESTSTORAGE kernel: process_one_work+0x144/0x370
Aug 9 15:54:12 TESTSTORAGE kernel: worker_thread+0x4d/0x3c0
Aug 9 15:54:12 TESTSTORAGE kernel: kthread+0x109/0x140
Aug 9 15:54:12 TESTSTORAGE kernel: ? rescuer_thread+0x360/0x360
Aug 9 15:54:12 TESTSTORAGE kernel: ? kthread_park+0x60/0x60
Aug 9 15:54:12 TESTSTORAGE kernel: ret_from_fork+0x25/0x30
Aug 9 15:54:12 TESTSTORAGE kernel: INFO: task kworker/1:82:3184 blocked
for more than 120 seconds.
Aug 9 15:54:12 TESTSTORAGE kernel: Not tainted 4.12.3p1 #3
Aug 9 15:54:12 TESTSTORAGE kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 15:54:12 TESTSTORAGE kernel: kworker/1:82 D 0 3184 2
0x00000080
Aug 9 15:54:12 TESTSTORAGE kernel: Workqueue: events
qlt_free_session_done [qla2xxx]
Aug 9 15:54:12 TESTSTORAGE kernel: Call Trace:
Aug 9 15:54:12 TESTSTORAGE kernel: __schedule+0x285/0x840
Aug 9 15:54:12 TESTSTORAGE kernel: schedule+0x36/0x80
Aug 9 15:54:12 TESTSTORAGE kernel: schedule_timeout+0x242/0x2f0
Aug 9 15:54:12 TESTSTORAGE kernel: ? ttwu_do_activate+0x77/0x80
Aug 9 15:54:12 TESTSTORAGE kernel: wait_for_completion+0x111/0x170
Aug 9 15:54:12 TESTSTORAGE kernel: ? wake_up_q+0x80/0x80
Aug 9 15:54:12 TESTSTORAGE kernel: target_wait_for_sess_cmds+0x49/0x170
[target_core_mod]
Aug 9 15:54:12 TESTSTORAGE kernel: tcm_qla2xxx_free_session+0x45/0x90
[tcm_qla2xxx]
Aug 9 15:54:12 TESTSTORAGE kernel: qlt_free_session_done+0x109/0x5a0
[qla2xxx]
Aug 9 15:54:12 TESTSTORAGE kernel: ? qlt_unreg_sess+0xe7/0xf0 [qla2xxx]
Aug 9 15:54:12 TESTSTORAGE kernel: ? tcm_qla2xxx_put_sess+0x3c/0x50
[tcm_qla2xxx]
Aug 9 15:54:12 TESTSTORAGE kernel: process_one_work+0x144/0x370
Aug 9 15:54:12 TESTSTORAGE kernel: worker_thread+0x4d/0x3c0
Aug 9 15:54:12 TESTSTORAGE kernel: kthread+0x109/0x140
Aug 9 15:54:12 TESTSTORAGE kernel: ? rescuer_thread+0x360/0x360
Aug 9 15:54:12 TESTSTORAGE kernel: ? kthread_park+0x60/0x60
Aug 9 15:54:12 TESTSTORAGE kernel: ret_from_fork+0x25/0x30
--
Lukasz Engel
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html