On Thu, Feb 11, 2016 at 8:19 PM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > On Thu, 2016-02-11 at 19:54 -0500, Dan Lane wrote: > > Top posting.. SORRY! I BLAME GOOGLE!!! > >> Well, looks like it wasn't as stable as we thought... > > Like I've already said multiple times, you need to find out what > component of your FC network is dropping packets. > >> Here is a clip >> from the logs, this is the only thing other than the ABORT_TASK I >> could find in the system logs. Unfortunately I have no idea when it >> stopped responding to my hosts. > > How do you know it's the target that stopped responding..? > > ESX will eventually take a device offline if it's not consistently > getting responses, resulting in constant generation of ABORT_TASKs. > > Again, it's a clear sign that you're having some manner of FC > connectivity issues. > >> My friend who was also testing this >> had virtually the same results (he also gets the frequent ABORT_TASK >> messages). >> >> Feb 10 20:33:48 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1167636 >> Feb 10 20:33:48 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1167636 >> Feb 10 20:34:07 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1183520 >> Feb 10 20:34:07 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1183520 >> Feb 10 20:44:31 dracofiler kernel: Unknown VPD Code: 0xc9 >> Feb 10 20:44:33 dracofiler kernel: Unknown VPD Code: 0xc9 >> Feb 10 20:44:47 dracofiler kernel: Unknown VPD Code: 0xc9 >> Feb 10 20:46:35 dracofiler kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 >> Feb 10 20:49:18 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1140928 >> Feb 10 20:49:18 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1140928 >> Feb 10 20:49:19 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1209480 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1209480 >> Feb 10 20:49:29 dracofiler kernel: Detected MISCOMPARE for addr: ffff88062b253000 buf: ffff88062b6e7c00 >> Feb 10 20:49:29 dracofiler kernel: Target/iblock: Send MISCOMPARE check condition and sense >> Feb 10 20:49:29 dracofiler kernel: Detected MISCOMPARE for addr: ffff880624bac000 buf: ffff88062b6e7c00 >> Feb 10 20:49:29 dracofiler kernel: Target/iblock: Send MISCOMPARE check condition and sense >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1216828 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187260 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187348 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187392 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187436 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187480 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187524 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187304 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187568 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187656 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187744 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187788 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187832 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187920 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188008 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188052 >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188096 >> Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1202880 >> Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1202880 >> Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1202968 >> Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1202968 >> Feb 10 20:51:37 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1204244 >> Feb 10 20:51:37 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1204244 >> > > Nothing out of the ordinary here on the target side. > > So let's start again with the basics. Please verify the: > > - qla2xxx firmware version you're using on the target side. > - FC HBA vendor, model and firmware version on the ESX side. > - The FC switch vendor, model and firmware version. > Okay, while I work on collecting that information, can you provide some insight on the following? [root@dracofiler init.d]# systemctl stop target.service [root@dracofiler init.d]# systemctl start target.service Job for target.service failed because a timeout was exceeded. See "systemctl status target.service" and "journalctl -xe" for details. [root@dracofiler init.d]# systemctl status target.service ● target.service - Restore LIO kernel target configuration Loaded: loaded (/usr/lib/systemd/system/target.service; enabled; vendor preset: disabled) Active: failed (Result: timeout) since Thu 2016-02-11 20:23:51 EST; 5min ago Process: 5063 ExecStart=/usr/bin/targetctl restore (code=exited, status=1/FAILURE) Main PID: 5063 (code=exited, status=1/FAILURE) CGroup: /system.slice/target.service └─control └─5009 /usr/bin/python3 /usr/bin/targetctl clear Feb 11 20:20:51 dracofiler.home.lan target[5063]: self.enable = False Feb 11 20:20:51 dracofiler.home.lan target[5063]: File "/usr/lib/python3.4/site-packages/rtslib_fb/target.py", line 245, in _set_enable Feb 11 20:20:51 dracofiler.home.lan target[5063]: raise RTSLibError("Cannot change enable state: %s" % e) Feb 11 20:20:51 dracofiler.home.lan target[5063]: rtslib_fb.utils.RTSLibError: Cannot change enable state: [Errno 1] Operation not permitted Feb 11 20:20:51 dracofiler.home.lan systemd[1]: target.service: Main process exited, code=exited, status=1/FAILURE Feb 11 20:22:21 dracofiler.home.lan systemd[1]: target.service: State 'stop-final-sigterm' timed out. Killing. Feb 11 20:23:51 dracofiler.home.lan systemd[1]: target.service: Processes still around after final SIGKILL. Entering failed mode. Feb 11 20:23:51 dracofiler.home.lan systemd[1]: Failed to start Restore LIO kernel target configuration. Feb 11 20:23:51 dracofiler.home.lan systemd[1]: target.service: Unit entered failed state. Feb 11 20:23:51 dracofiler.home.lan systemd[1]: target.service: Failed with result 'timeout'. [root@dracofiler init.d]# journalctl -xe -- Unit target.service has finished shutting down. Feb 11 20:03:03 dracofiler.home.lan audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=target comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed Feb 11 20:03:03 dracofiler.home.lan systemd[1]: target.service: Unit entered failed state. Feb 11 20:03:03 dracofiler.home.lan systemd[1]: target.service: Failed with result 'timeout'. Feb 11 20:03:03 dracofiler.home.lan polkitd[1597]: Unregistered Authentication Agent for unix-process:4991:16458262 (system bus name :1.25, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected fr Feb 11 20:20:47 dracofiler.home.lan polkitd[1597]: Registered Authentication Agent for unix-process:5045:16609861 (system bus name :1.26 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/Authenticati Feb 11 20:20:47 dracofiler.home.lan systemd[1]: Starting Restore LIO kernel target configuration... -- Subject: Unit target.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit target.service has begun starting up. Feb 11 20:20:48 dracofiler.home.lan python3[5063]: detected unhandled Python exception in '/usr/bin/targetctl' Feb 11 20:20:51 dracofiler.home.lan abrt-server[5065]: Deleting problem directory Python3-2016-02-11-20:20:48-5063 (dup of Python3-2016-01-23-01:42:51-1732) Feb 11 20:20:51 dracofiler.home.lan dbus[1588]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) Feb 11 20:20:51 dracofiler.home.lan dbus[1588]: [system] Successfully activated service 'org.freedesktop.problems' Feb 11 20:20:51 dracofiler.home.lan target[5063]: PermissionError: [Errno 1] Operation not permitted Feb 11 20:20:51 dracofiler.home.lan target[5063]: During handling of the above exception, another exception occurred: Feb 11 20:20:51 dracofiler.home.lan target[5063]: Traceback (most recent call last): Feb 11 20:20:51 dracofiler.home.lan target[5063]: File "/usr/lib/python3.4/site-packages/rtslib_fb/target.py", line 243, in _set_enable Feb 11 20:20:51 dracofiler.home.lan target[5063]: fwrite(path, str(int(boolean))) Feb 11 20:20:51 dracofiler.home.lan target[5063]: File "/usr/lib/python3.4/site-packages/rtslib_fb/utils.py", line 69, in fwrite Feb 11 20:20:51 dracofiler.home.lan target[5063]: file_fd.write(str(string)) Feb 11 20:20:51 dracofiler.home.lan target[5063]: PermissionError: [Errno 1] Operation not permitted Feb 11 20:20:51 dracofiler.home.lan target[5063]: During handling of the above exception, another exception occurred: Feb 11 20:20:51 dracofiler.home.lan target[5063]: Traceback (most recent call last): Feb 11 20:20:51 dracofiler.home.lan target[5063]: File "/usr/bin/targetctl", line 82, in <module> Feb 11 20:20:51 dracofiler.home.lan target[5063]: main() Feb 11 20:20:51 dracofiler.home.lan target[5063]: File "/usr/bin/targetctl", line 79, in main Feb 11 20:20:51 dracofiler.home.lan target[5063]: funcs[sys.argv[1]](savefile) Feb 11 20:20:51 dracofiler.home.lan target[5063]: File "/usr/bin/targetctl", line 47, in restore Feb 11 20:20:51 dracofiler.home.lan target[5063]: errors = RTSRoot().restore_from_file(restore_file=from_file) Feb 11 20:20:51 dracofiler.home.lan target[5063]: File "/usr/lib/python3.4/site-packages/rtslib_fb/root.py", line 269, in restore_from_file Feb 11 20:20:51 dracofiler.home.lan target[5063]: abort_on_error=abort_on_error) Feb 11 20:20:51 dracofiler.home.lan target[5063]: File "/usr/lib/python3.4/site-packages/rtslib_fb/root.py", line 176, in restore Feb 11 20:20:51 dracofiler.home.lan target[5063]: self.clear_existing(confirm=True) Feb 11 20:20:51 dracofiler.home.lan target[5063]: File "/usr/lib/python3.4/site-packages/rtslib_fb/root.py", line 162, in clear_existing Feb 11 20:20:51 dracofiler.home.lan target[5063]: t.delete() Feb 11 20:20:51 dracofiler.home.lan target[5063]: File "/usr/lib/python3.4/site-packages/rtslib_fb/target.py", line 109, in delete Feb 11 20:20:51 dracofiler.home.lan target[5063]: tpg.delete() Feb 11 20:20:51 dracofiler.home.lan target[5063]: File "/usr/lib/python3.4/site-packages/rtslib_fb/target.py", line 339, in delete Feb 11 20:20:51 dracofiler.home.lan target[5063]: self.enable = False Feb 11 20:20:51 dracofiler.home.lan target[5063]: File "/usr/lib/python3.4/site-packages/rtslib_fb/target.py", line 245, in _set_enable Feb 11 20:20:51 dracofiler.home.lan target[5063]: raise RTSLibError("Cannot change enable state: %s" % e) Feb 11 20:20:51 dracofiler.home.lan target[5063]: rtslib_fb.utils.RTSLibError: Cannot change enable state: [Errno 1] Operation not permitted Feb 11 20:20:51 dracofiler.home.lan systemd[1]: target.service: Main process exited, code=exited, status=1/FAILURE Feb 11 20:22:21 dracofiler.home.lan systemd[1]: target.service: State 'stop-final-sigterm' timed out. Killing. Feb 11 20:23:51 dracofiler.home.lan systemd[1]: target.service: Processes still around after final SIGKILL. Entering failed mode. Feb 11 20:23:51 dracofiler.home.lan systemd[1]: Failed to start Restore LIO kernel target configuration. -- Subject: Unit target.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit target.service has failed. -- -- The result is failed. Feb 11 20:23:51 dracofiler.home.lan audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=target comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=faile Feb 11 20:23:51 dracofiler.home.lan systemd[1]: target.service: Unit entered failed state. Feb 11 20:23:51 dracofiler.home.lan systemd[1]: target.service: Failed with result 'timeout'. Feb 11 20:23:51 dracofiler.home.lan polkitd[1597]: Unregistered Authentication Agent for unix-process:5045:16609861 (system bus name :1.26, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected fr Thanks again, Dan -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html