Hi Thomas, On Fri, 2013-10-11 at 13:38 +0200, Thomas Glanzmann wrote: > Hello Nab, > just when I did the evaluation of my todays class, the target 'crashed' > I could resolve the issue by rebooting the target. In the log files I > got: > > Oct 11 11:53:56 node-62 kernel: [219465.151250] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5488 > Oct 11 11:53:56 node-62 kernel: [219465.151261] ABORT_TASK: Found referenced iSCSI task_tag: 5494 > Oct 11 11:53:56 node-62 kernel: [219465.151264] ABORT_TASK: ref_tag: 5494 already complete, skipping > Oct 11 11:53:56 node-62 kernel: [219465.151267] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5494 > Oct 11 11:53:56 node-62 kernel: [219465.151271] ABORT_TASK: Found referenced iSCSI task_tag: 5495 > Oct 11 11:53:56 node-62 kernel: [219465.151273] ABORT_TASK: ref_tag: 5495 already complete, skipping > Oct 11 11:53:56 node-62 kernel: [219465.151275] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5495 > Oct 11 11:54:09 node-62 kernel: [219478.744212] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000008 > Oct 11 11:54:09 node-62 kernel: [219478.751738] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5508 > Oct 11 11:54:23 node-62 kernel: [219492.351282] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000013 > Oct 11 11:54:23 node-62 kernel: [219492.358819] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5514 > Oct 11 11:54:23 node-62 kernel: [219492.630489] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000001d > Oct 11 11:54:23 node-62 kernel: [219492.638250] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000001e > Oct 11 11:54:23 node-62 kernel: [219492.646156] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000001f > Oct 11 11:54:23 node-62 kernel: [219492.653991] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000020 > ... > > It looks like the storage forgot about all LUNs. I hope that I can > reproduce the issue. We were patching our ESX servers while the problem > happened. I'll try to reproduce the issue. > > https://thomas.glanzmann.de/crash/ > > Have you seen a similiar issue before? > Mmmm, there is a warning from the target about lio_qr_cache leaking memory once iscsi_target_mod was unloaded, but there is not an actual OOPs being triggered here that indicates a specific target problem. Looking at the vmkernel.org from the ESX side, the ABORT_TASKs above appear to be generated from command timeouts, followed by iscsi connection resets, followed by the devices being taken offline. Btw, if your able to reproduce it would be useful to enable the dynamic debugging for iscsi_target_mod to see if the ESX client is trying to reconnect.. --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html