On Thu, 2016-02-11 at 20:42 -0500, Dan Lane wrote: > On Thu, Feb 11, 2016 at 8:19 PM, Nicholas A. Bellinger > <nab@xxxxxxxxxxxxxxx> wrote: > > On Thu, 2016-02-11 at 19:54 -0500, Dan Lane wrote: > > > > Top posting.. > SORRY! I BLAME GOOGLE!!! > > > >> Well, looks like it wasn't as stable as we thought... > > > > Like I've already said multiple times, you need to find out what > > component of your FC network is dropping packets. > > > >> Here is a clip > >> from the logs, this is the only thing other than the ABORT_TASK I > >> could find in the system logs. Unfortunately I have no idea when it > >> stopped responding to my hosts. > > > > How do you know it's the target that stopped responding..? > > > > ESX will eventually take a device offline if it's not consistently > > getting responses, resulting in constant generation of ABORT_TASKs. > > > > Again, it's a clear sign that you're having some manner of FC > > connectivity issues. > > > >> My friend who was also testing this > >> had virtually the same results (he also gets the frequent ABORT_TASK > >> messages). > >> > >> Feb 10 20:33:48 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1167636 > >> Feb 10 20:33:48 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1167636 > >> Feb 10 20:34:07 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1183520 > >> Feb 10 20:34:07 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1183520 > >> Feb 10 20:44:31 dracofiler kernel: Unknown VPD Code: 0xc9 > >> Feb 10 20:44:33 dracofiler kernel: Unknown VPD Code: 0xc9 > >> Feb 10 20:44:47 dracofiler kernel: Unknown VPD Code: 0xc9 > >> Feb 10 20:46:35 dracofiler kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > >> Feb 10 20:49:18 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1140928 > >> Feb 10 20:49:18 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1140928 > >> Feb 10 20:49:19 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1209480 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1209480 > >> Feb 10 20:49:29 dracofiler kernel: Detected MISCOMPARE for addr: ffff88062b253000 buf: ffff88062b6e7c00 > >> Feb 10 20:49:29 dracofiler kernel: Target/iblock: Send MISCOMPARE check condition and sense > >> Feb 10 20:49:29 dracofiler kernel: Detected MISCOMPARE for addr: ffff880624bac000 buf: ffff88062b6e7c00 > >> Feb 10 20:49:29 dracofiler kernel: Target/iblock: Send MISCOMPARE check condition and sense > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1216828 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187260 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187348 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187392 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187436 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187480 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187524 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187304 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187568 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187656 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187744 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187788 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187832 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187920 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188008 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188052 > >> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188096 > >> Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1202880 > >> Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1202880 > >> Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1202968 > >> Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1202968 > >> Feb 10 20:51:37 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1204244 > >> Feb 10 20:51:37 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1204244 > >> > > > > Nothing out of the ordinary here on the target side. > > > > So let's start again with the basics. Please verify the: > > > > - qla2xxx firmware version you're using on the target side. > > - FC HBA vendor, model and firmware version on the ESX side. > > - The FC switch vendor, model and firmware version. > > > > Okay, while I work on collecting that information, can you provide > some insight on the following? > This looks like some rtslib-fb + systemd related failure. (Adding Andy CC') > [root@dracofiler init.d]# systemctl stop target.service > [root@dracofiler init.d]# systemctl start target.service > Job for target.service failed because a timeout was exceeded. See > "systemctl status target.service" and "journalctl -xe" for details. Putting a qla2xxx port into target mode (including the ISP reset) can take on the order of ~10 seconds per port It looks like your hitting some manner of systemd start timeout. > [root@dracofiler init.d]# systemctl status target.service > ● target.service - Restore LIO kernel target configuration > Loaded: loaded (/usr/lib/systemd/system/target.service; enabled; > vendor preset: disabled) > Active: failed (Result: timeout) since Thu 2016-02-11 20:23:51 EST; 5min ago > Process: 5063 ExecStart=/usr/bin/targetctl restore (code=exited, > status=1/FAILURE) > Main PID: 5063 (code=exited, status=1/FAILURE) > CGroup: /system.slice/target.service > └─control > └─5009 /usr/bin/python3 /usr/bin/targetctl clear > > Feb 11 20:20:51 dracofiler.home.lan target[5063]: self.enable = False > Feb 11 20:20:51 dracofiler.home.lan target[5063]: File > "/usr/lib/python3.4/site-packages/rtslib_fb/target.py", line 245, in > _set_enable > Feb 11 20:20:51 dracofiler.home.lan target[5063]: raise > RTSLibError("Cannot change enable state: %s" % e) > Feb 11 20:20:51 dracofiler.home.lan target[5063]: > rtslib_fb.utils.RTSLibError: Cannot change enable state: [Errno 1] > Operation not permitted This means: 'echo $VAL > /sys/kernel/config/target/qla2xxx/$WWPN/$TPGT/enable' is failing with -EPERM. This typically means userspace is trying to enable target mode on a endpoint that's already been enabled, or disabling target mode on a endpoint that's already been disabled. So yeah, it looks like some type of rtslib-fb + systemd related problem. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html