Re: kernel BUG at drivers/scsi/qla2xxx/qla_target.c:3643! (or :3319)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nicolas,

W dniu 08.08.2017 o 23:00, Łukasz Engel pisze:
Hi Nicolas,


[...]
Tomorrow I plan to retest using different sas controller which I have on shelf (LSI). We will see if it's replicable in that configuration, but as Pascal got this for quite different setup than mine I expect to got similar results.

Personally for me (If I properly understand your description) it looks like in some situation i/o completion notification for read part of TAS operation "got lost" somewhere in block/target layer. (And then timeout for operation kicks in esx and it sends ABORT_TASK for this operation)


(BTW - I saw some non-zero values also for dm-* inflight counters during i/o activity, so it looks lvm currently (4.12) provide inflight stats )

I will let you know tomorrow abount my findings


With LSI controller (HW RAID0 built on same 6 HDDs ), without LVM - same results:

Aug 9 15:49:32 TESTSTORAGE kernel: Detected MISCOMPARE for addr: ffff881f901e0000 buf: ffff881f83c5c800 Aug 9 15:49:32 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check condition and sense Aug 9 15:49:32 TESTSTORAGE kernel: Detected MISCOMPARE for addr: ffff881f83e57000 buf: ffff881f83c5c800 Aug 9 15:49:32 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check condition and sense Aug 9 15:49:32 TESTSTORAGE kernel: Detected MISCOMPARE for addr: ffff881f7ee51000 buf: ffff881f83c5c800 Aug 9 15:49:32 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check condition and sense Aug 9 15:49:32 TESTSTORAGE kernel: Detected MISCOMPARE for addr: ffff881f81745000 buf: ffff881f83c5c800 Aug 9 15:49:32 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check condition and sense Aug 9 15:49:33 TESTSTORAGE kernel: Detected MISCOMPARE for addr: ffff881f7ef99000 buf: ffff881f83c5c800 Aug 9 15:49:33 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check condition and sense Aug 9 15:49:33 TESTSTORAGE kernel: Detected MISCOMPARE for addr: ffff881f7e56a000 buf: ffff881f83c5c800 Aug 9 15:49:33 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check condition and sense Aug 9 15:49:33 TESTSTORAGE kernel: Detected MISCOMPARE for addr: ffff881f7edf1000 buf: ffff881f83c5c800 Aug 9 15:49:33 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check condition and sense Aug 9 15:49:33 TESTSTORAGE kernel: Detected MISCOMPARE for addr: ffff881f8faaa000 buf: ffff881f83c5c800 Aug 9 15:49:33 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check condition and sense Aug 9 15:49:37 TESTSTORAGE kernel: Detected MISCOMPARE for addr: ffff881f815d1000 buf: ffff881f83c5fa00 Aug 9 15:49:37 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check condition and sense Aug 9 15:49:37 TESTSTORAGE kernel: Detected MISCOMPARE for addr: ffff881f8146e000 buf: ffff881f83c5f200 Aug 9 15:49:37 TESTSTORAGE kernel: Target/iblock: Send MISCOMPARE check condition and sense Aug 9 15:49:37 TESTSTORAGE kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1138156 Aug 9 15:49:37 TESTSTORAGE kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1138156 Aug 9 15:49:37 TESTSTORAGE kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1138200 Aug 9 15:49:37 TESTSTORAGE kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1138200 Aug 9 15:49:50 TESTSTORAGE kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1193992 Aug 9 15:49:52 TESTSTORAGE kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1193992 Aug 9 15:49:52 TESTSTORAGE kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1194036 Aug 9 15:49:57 TESTSTORAGE kernel: TARGET_CORE[qla2xxx]: Detected NON_EXISTENT_LUN Access for 0x00000001 Aug 9 15:49:57 TESTSTORAGE kernel: TARGET_CORE[qla2xxx]: Detected NON_EXISTENT_LUN Access for 0x00000002
[...] <probably periodic LUN scan issued by esx>
Aug 9 15:49:57 TESTSTORAGE kernel: TARGET_CORE[qla2xxx]: Detected NON_EXISTENT_LUN Access for 0x000000ff Aug 9 15:52:09 TESTSTORAGE kernel: INFO: task kworker/u24:0:3001 blocked for more than 120 seconds.
Aug  9 15:52:09 TESTSTORAGE kernel:      Not tainted 4.12.3p1 #3
Aug 9 15:52:09 TESTSTORAGE kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 9 15:52:09 TESTSTORAGE kernel: kworker/u24:0 D 0 3001 2 0x00000080 Aug 9 15:52:09 TESTSTORAGE kernel: Workqueue: tmr-iblock target_tmr_work [target_core_mod]
Aug  9 15:52:09 TESTSTORAGE kernel: Call Trace:
Aug  9 15:52:09 TESTSTORAGE kernel: __schedule+0x285/0x840
Aug  9 15:52:09 TESTSTORAGE kernel: schedule+0x36/0x80
Aug  9 15:52:09 TESTSTORAGE kernel: schedule_timeout+0x173/0x2f0
Aug  9 15:52:09 TESTSTORAGE kernel: ? call_timer_fn+0x140/0x140
Aug  9 15:52:09 TESTSTORAGE kernel: ? get_work_pool+0x2d/0x40
Aug  9 15:52:09 TESTSTORAGE kernel: wait_for_completion_timeout+0x12c/0x190
Aug  9 15:52:09 TESTSTORAGE kernel: ? wake_up_q+0x80/0x80
Aug 9 15:52:09 TESTSTORAGE kernel: __transport_wait_for_tasks+0xd9/0x1b0 [target_core_mod] Aug 9 15:52:09 TESTSTORAGE kernel: transport_wait_for_tasks+0x53/0x90 [target_core_mod] Aug 9 15:52:09 TESTSTORAGE kernel: core_tmr_abort_task+0x10e/0x190 [target_core_mod] Aug 9 15:52:09 TESTSTORAGE kernel: target_tmr_work+0x111/0x120 [target_core_mod]
Aug  9 15:52:09 TESTSTORAGE kernel: process_one_work+0x144/0x370
Aug  9 15:52:09 TESTSTORAGE kernel: worker_thread+0x4d/0x3c0
Aug  9 15:52:09 TESTSTORAGE kernel: kthread+0x109/0x140
Aug  9 15:52:09 TESTSTORAGE kernel: ? rescuer_thread+0x360/0x360
Aug  9 15:52:09 TESTSTORAGE kernel: ? kthread_park+0x60/0x60
Aug  9 15:52:09 TESTSTORAGE kernel: ret_from_fork+0x25/0x30
Aug 9 15:52:54 TESTSTORAGE kernel: se_cmd waiting CDB: 0x89 tag: 1194036 t_state: 5se_cmd_flags: 0x01c80109 transport_state: 0x00000033

At this point inflight counters for sdc are "0 0".
Later there are more "hung task" notifications, but probably they are result of the thirst one...

Aug 9 15:54:12 TESTSTORAGE kernel: INFO: task kworker/1:75:3172 blocked for more than 120 seconds.
Aug  9 15:54:12 TESTSTORAGE kernel:      Not tainted 4.12.3p1 #3
Aug 9 15:54:12 TESTSTORAGE kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 9 15:54:12 TESTSTORAGE kernel: kworker/1:75 D 0 3172 2 0x00000080 Aug 9 15:54:12 TESTSTORAGE kernel: Workqueue: events qlt_free_session_done [qla2xxx]
Aug  9 15:54:12 TESTSTORAGE kernel: Call Trace:
Aug  9 15:54:12 TESTSTORAGE kernel: __schedule+0x285/0x840
Aug  9 15:54:12 TESTSTORAGE kernel: schedule+0x36/0x80
Aug  9 15:54:12 TESTSTORAGE kernel: schedule_timeout+0x242/0x2f0
Aug  9 15:54:12 TESTSTORAGE kernel: ? ttwu_do_activate+0x77/0x80
Aug  9 15:54:12 TESTSTORAGE kernel: wait_for_completion+0x111/0x170
Aug  9 15:54:12 TESTSTORAGE kernel: ? wake_up_q+0x80/0x80
Aug 9 15:54:12 TESTSTORAGE kernel: target_wait_for_sess_cmds+0x49/0x170 [target_core_mod] Aug 9 15:54:12 TESTSTORAGE kernel: tcm_qla2xxx_free_session+0x45/0x90 [tcm_qla2xxx] Aug 9 15:54:12 TESTSTORAGE kernel: qlt_free_session_done+0x109/0x5a0 [qla2xxx]
Aug  9 15:54:12 TESTSTORAGE kernel: ? qlt_unreg_sess+0xe7/0xf0 [qla2xxx]
Aug 9 15:54:12 TESTSTORAGE kernel: ? tcm_qla2xxx_put_sess+0x3c/0x50 [tcm_qla2xxx]
Aug  9 15:54:12 TESTSTORAGE kernel: process_one_work+0x144/0x370
Aug  9 15:54:12 TESTSTORAGE kernel: worker_thread+0x4d/0x3c0
Aug  9 15:54:12 TESTSTORAGE kernel: kthread+0x109/0x140
Aug  9 15:54:12 TESTSTORAGE kernel: ? rescuer_thread+0x360/0x360
Aug  9 15:54:12 TESTSTORAGE kernel: ? kthread_park+0x60/0x60
Aug  9 15:54:12 TESTSTORAGE kernel: ret_from_fork+0x25/0x30
Aug 9 15:54:12 TESTSTORAGE kernel: INFO: task kworker/1:82:3184 blocked for more than 120 seconds.
Aug  9 15:54:12 TESTSTORAGE kernel:      Not tainted 4.12.3p1 #3
Aug 9 15:54:12 TESTSTORAGE kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 9 15:54:12 TESTSTORAGE kernel: kworker/1:82 D 0 3184 2 0x00000080 Aug 9 15:54:12 TESTSTORAGE kernel: Workqueue: events qlt_free_session_done [qla2xxx]
Aug  9 15:54:12 TESTSTORAGE kernel: Call Trace:
Aug  9 15:54:12 TESTSTORAGE kernel: __schedule+0x285/0x840
Aug  9 15:54:12 TESTSTORAGE kernel: schedule+0x36/0x80
Aug  9 15:54:12 TESTSTORAGE kernel: schedule_timeout+0x242/0x2f0
Aug  9 15:54:12 TESTSTORAGE kernel: ? ttwu_do_activate+0x77/0x80
Aug  9 15:54:12 TESTSTORAGE kernel: wait_for_completion+0x111/0x170
Aug  9 15:54:12 TESTSTORAGE kernel: ? wake_up_q+0x80/0x80
Aug 9 15:54:12 TESTSTORAGE kernel: target_wait_for_sess_cmds+0x49/0x170 [target_core_mod] Aug 9 15:54:12 TESTSTORAGE kernel: tcm_qla2xxx_free_session+0x45/0x90 [tcm_qla2xxx] Aug 9 15:54:12 TESTSTORAGE kernel: qlt_free_session_done+0x109/0x5a0 [qla2xxx]
Aug  9 15:54:12 TESTSTORAGE kernel: ? qlt_unreg_sess+0xe7/0xf0 [qla2xxx]
Aug 9 15:54:12 TESTSTORAGE kernel: ? tcm_qla2xxx_put_sess+0x3c/0x50 [tcm_qla2xxx]
Aug  9 15:54:12 TESTSTORAGE kernel: process_one_work+0x144/0x370
Aug  9 15:54:12 TESTSTORAGE kernel: worker_thread+0x4d/0x3c0
Aug  9 15:54:12 TESTSTORAGE kernel: kthread+0x109/0x140
Aug  9 15:54:12 TESTSTORAGE kernel: ? rescuer_thread+0x360/0x360
Aug  9 15:54:12 TESTSTORAGE kernel: ? kthread_park+0x60/0x60
Aug  9 15:54:12 TESTSTORAGE kernel: ret_from_fork+0x25/0x30

--
Lukasz Engel


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux