Re: LUNs become unavailable with current git HEAD

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Fri, 11 Oct 2013 15:09:14 -0700

Hi Thomas,

On Fri, 2013-10-11 at 13:38 +0200, Thomas Glanzmann wrote:
> Hello Nab,
> just when I did the evaluation of my todays class, the target 'crashed'
> I could resolve the issue by rebooting the target. In the log files I
> got:
> 
> Oct 11 11:53:56 node-62 kernel: [219465.151250] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5488
> Oct 11 11:53:56 node-62 kernel: [219465.151261] ABORT_TASK: Found referenced iSCSI task_tag: 5494
> Oct 11 11:53:56 node-62 kernel: [219465.151264] ABORT_TASK: ref_tag: 5494 already complete, skipping
> Oct 11 11:53:56 node-62 kernel: [219465.151267] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5494
> Oct 11 11:53:56 node-62 kernel: [219465.151271] ABORT_TASK: Found referenced iSCSI task_tag: 5495
> Oct 11 11:53:56 node-62 kernel: [219465.151273] ABORT_TASK: ref_tag: 5495 already complete, skipping
> Oct 11 11:53:56 node-62 kernel: [219465.151275] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5495
> Oct 11 11:54:09 node-62 kernel: [219478.744212] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000008
> Oct 11 11:54:09 node-62 kernel: [219478.751738] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5508
> Oct 11 11:54:23 node-62 kernel: [219492.351282] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000013
> Oct 11 11:54:23 node-62 kernel: [219492.358819] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5514
> Oct 11 11:54:23 node-62 kernel: [219492.630489] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000001d
> Oct 11 11:54:23 node-62 kernel: [219492.638250] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000001e
> Oct 11 11:54:23 node-62 kernel: [219492.646156] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000001f
> Oct 11 11:54:23 node-62 kernel: [219492.653991] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000020
> ...
> 
> It looks like the storage forgot about all LUNs. I hope that I can
> reproduce the issue. We were patching our ESX servers while the problem
> happened. I'll try to reproduce the issue.
> 
> https://thomas.glanzmann.de/crash/
> 
> Have you seen a similiar issue before?
> 

Mmmm, there is a warning from the target about lio_qr_cache leaking
memory once iscsi_target_mod was unloaded, but there is not an actual
OOPs being triggered here that indicates a specific target problem.

Looking at the vmkernel.org from the ESX side, the ABORT_TASKs above
appear to be generated from command timeouts, followed by iscsi
connection resets, followed by the devices being taken offline.

Btw, if your able to reproduce it would be useful to enable the dynamic
debugging for iscsi_target_mod to see if the ESX client is trying to
reconnect..

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html