Re: target crashes with vSphere 6 hosts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 11, 2016 at 8:19 PM, Nicholas A. Bellinger
<nab@xxxxxxxxxxxxxxx> wrote:
> On Thu, 2016-02-11 at 19:54 -0500, Dan Lane wrote:
>
> Top posting..
SORRY!  I BLAME GOOGLE!!!
>
>> Well, looks like it wasn't as stable as we thought...
>
> Like I've already said multiple times, you need to find out what
> component of your FC network is dropping packets.
>
>> Here is a clip
>> from the logs, this is the only thing other than the ABORT_TASK I
>> could find in the system logs.  Unfortunately I have no idea when it
>> stopped responding to my hosts.
>
> How do you know it's the target that stopped responding..?
>
> ESX will eventually take a device offline if it's not consistently
> getting responses, resulting in constant generation of ABORT_TASKs.
>
> Again, it's a clear sign that you're having some manner of FC
> connectivity issues.
>
>>   My friend who was also testing this
>> had virtually the same results (he also gets the frequent ABORT_TASK
>> messages).
>>
>> Feb 10 20:33:48 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1167636
>> Feb 10 20:33:48 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1167636
>> Feb 10 20:34:07 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1183520
>> Feb 10 20:34:07 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1183520
>> Feb 10 20:44:31 dracofiler kernel: Unknown VPD Code: 0xc9
>> Feb 10 20:44:33 dracofiler kernel: Unknown VPD Code: 0xc9
>> Feb 10 20:44:47 dracofiler kernel: Unknown VPD Code: 0xc9
>> Feb 10 20:46:35 dracofiler kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
>> Feb 10 20:49:18 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1140928
>> Feb 10 20:49:18 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1140928
>> Feb 10 20:49:19 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1209480
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1209480
>> Feb 10 20:49:29 dracofiler kernel: Detected MISCOMPARE for addr: ffff88062b253000 buf: ffff88062b6e7c00
>> Feb 10 20:49:29 dracofiler kernel: Target/iblock: Send MISCOMPARE check condition and sense
>> Feb 10 20:49:29 dracofiler kernel: Detected MISCOMPARE for addr: ffff880624bac000 buf: ffff88062b6e7c00
>> Feb 10 20:49:29 dracofiler kernel: Target/iblock: Send MISCOMPARE check condition and sense
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1216828
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187260
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187348
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187392
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187436
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187480
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187524
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187304
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187568
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187656
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187744
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187788
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187832
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187920
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188008
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188052
>> Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188096
>> Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1202880
>> Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1202880
>> Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1202968
>> Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1202968
>> Feb 10 20:51:37 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1204244
>> Feb 10 20:51:37 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1204244
>>
>
> Nothing out of the ordinary here on the target side.
>
> So let's start again with the basics.  Please verify the:
>
>   - qla2xxx firmware version you're using on the target side.
>   - FC HBA vendor, model and firmware version on the ESX side.
>   - The FC switch vendor, model and firmware version.
>

Okay, while I work on collecting that information, can you provide
some insight on the following?

[root@dracofiler init.d]# systemctl stop  target.service
[root@dracofiler init.d]# systemctl start  target.service
Job for target.service failed because a timeout was exceeded. See
"systemctl status target.service" and "journalctl -xe" for details.
[root@dracofiler init.d]# systemctl status target.service
● target.service - Restore LIO kernel target configuration
   Loaded: loaded (/usr/lib/systemd/system/target.service; enabled;
vendor preset: disabled)
   Active: failed (Result: timeout) since Thu 2016-02-11 20:23:51 EST; 5min ago
  Process: 5063 ExecStart=/usr/bin/targetctl restore (code=exited,
status=1/FAILURE)
 Main PID: 5063 (code=exited, status=1/FAILURE)
   CGroup: /system.slice/target.service
           └─control
             └─5009 /usr/bin/python3 /usr/bin/targetctl clear

Feb 11 20:20:51 dracofiler.home.lan target[5063]: self.enable = False
Feb 11 20:20:51 dracofiler.home.lan target[5063]: File
"/usr/lib/python3.4/site-packages/rtslib_fb/target.py", line 245, in
_set_enable
Feb 11 20:20:51 dracofiler.home.lan target[5063]: raise
RTSLibError("Cannot change enable state: %s" % e)
Feb 11 20:20:51 dracofiler.home.lan target[5063]:
rtslib_fb.utils.RTSLibError: Cannot change enable state: [Errno 1]
Operation not permitted
Feb 11 20:20:51 dracofiler.home.lan systemd[1]: target.service: Main
process exited, code=exited, status=1/FAILURE
Feb 11 20:22:21 dracofiler.home.lan systemd[1]: target.service: State
'stop-final-sigterm' timed out. Killing.
Feb 11 20:23:51 dracofiler.home.lan systemd[1]: target.service:
Processes still around after final SIGKILL. Entering failed mode.
Feb 11 20:23:51 dracofiler.home.lan systemd[1]: Failed to start
Restore LIO kernel target configuration.
Feb 11 20:23:51 dracofiler.home.lan systemd[1]: target.service: Unit
entered failed state.
Feb 11 20:23:51 dracofiler.home.lan systemd[1]: target.service: Failed
with result 'timeout'.

[root@dracofiler init.d]# journalctl -xe
-- Unit target.service has finished shutting down.
Feb 11 20:03:03 dracofiler.home.lan audit[1]: SERVICE_STOP pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=target comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=failed
Feb 11 20:03:03 dracofiler.home.lan systemd[1]: target.service: Unit
entered failed state.
Feb 11 20:03:03 dracofiler.home.lan systemd[1]: target.service: Failed
with result 'timeout'.
Feb 11 20:03:03 dracofiler.home.lan polkitd[1597]: Unregistered
Authentication Agent for unix-process:4991:16458262 (system bus name
:1.25, object path /org/freedesktop/PolicyKit1/AuthenticationAgent,
locale en_US.UTF-8) (disconnected fr
Feb 11 20:20:47 dracofiler.home.lan polkitd[1597]: Registered
Authentication Agent for unix-process:5045:16609861 (system bus name
:1.26 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path
/org/freedesktop/PolicyKit1/Authenticati
Feb 11 20:20:47 dracofiler.home.lan systemd[1]: Starting Restore LIO
kernel target configuration...
-- Subject: Unit target.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit target.service has begun starting up.
Feb 11 20:20:48 dracofiler.home.lan python3[5063]: detected unhandled
Python exception in '/usr/bin/targetctl'
Feb 11 20:20:51 dracofiler.home.lan abrt-server[5065]: Deleting
problem directory Python3-2016-02-11-20:20:48-5063 (dup of
Python3-2016-01-23-01:42:51-1732)
Feb 11 20:20:51 dracofiler.home.lan dbus[1588]: [system] Activating
service name='org.freedesktop.problems' (using servicehelper)
Feb 11 20:20:51 dracofiler.home.lan dbus[1588]: [system] Successfully
activated service 'org.freedesktop.problems'
Feb 11 20:20:51 dracofiler.home.lan target[5063]: PermissionError:
[Errno 1] Operation not permitted
Feb 11 20:20:51 dracofiler.home.lan target[5063]: During handling of
the above exception, another exception occurred:
Feb 11 20:20:51 dracofiler.home.lan target[5063]: Traceback (most
recent call last):
Feb 11 20:20:51 dracofiler.home.lan target[5063]: File
"/usr/lib/python3.4/site-packages/rtslib_fb/target.py", line 243, in
_set_enable
Feb 11 20:20:51 dracofiler.home.lan target[5063]: fwrite(path,
str(int(boolean)))
Feb 11 20:20:51 dracofiler.home.lan target[5063]: File
"/usr/lib/python3.4/site-packages/rtslib_fb/utils.py", line 69, in
fwrite
Feb 11 20:20:51 dracofiler.home.lan target[5063]: file_fd.write(str(string))
Feb 11 20:20:51 dracofiler.home.lan target[5063]: PermissionError:
[Errno 1] Operation not permitted
Feb 11 20:20:51 dracofiler.home.lan target[5063]: During handling of
the above exception, another exception occurred:
Feb 11 20:20:51 dracofiler.home.lan target[5063]: Traceback (most
recent call last):
Feb 11 20:20:51 dracofiler.home.lan target[5063]: File
"/usr/bin/targetctl", line 82, in <module>
Feb 11 20:20:51 dracofiler.home.lan target[5063]: main()
Feb 11 20:20:51 dracofiler.home.lan target[5063]: File
"/usr/bin/targetctl", line 79, in main
Feb 11 20:20:51 dracofiler.home.lan target[5063]: funcs[sys.argv[1]](savefile)
Feb 11 20:20:51 dracofiler.home.lan target[5063]: File
"/usr/bin/targetctl", line 47, in restore
Feb 11 20:20:51 dracofiler.home.lan target[5063]: errors =
RTSRoot().restore_from_file(restore_file=from_file)
Feb 11 20:20:51 dracofiler.home.lan target[5063]: File
"/usr/lib/python3.4/site-packages/rtslib_fb/root.py", line 269, in
restore_from_file
Feb 11 20:20:51 dracofiler.home.lan target[5063]: abort_on_error=abort_on_error)
Feb 11 20:20:51 dracofiler.home.lan target[5063]: File
"/usr/lib/python3.4/site-packages/rtslib_fb/root.py", line 176, in
restore
Feb 11 20:20:51 dracofiler.home.lan target[5063]:
self.clear_existing(confirm=True)
Feb 11 20:20:51 dracofiler.home.lan target[5063]: File
"/usr/lib/python3.4/site-packages/rtslib_fb/root.py", line 162, in
clear_existing
Feb 11 20:20:51 dracofiler.home.lan target[5063]: t.delete()
Feb 11 20:20:51 dracofiler.home.lan target[5063]: File
"/usr/lib/python3.4/site-packages/rtslib_fb/target.py", line 109, in
delete
Feb 11 20:20:51 dracofiler.home.lan target[5063]: tpg.delete()
Feb 11 20:20:51 dracofiler.home.lan target[5063]: File
"/usr/lib/python3.4/site-packages/rtslib_fb/target.py", line 339, in
delete
Feb 11 20:20:51 dracofiler.home.lan target[5063]: self.enable = False
Feb 11 20:20:51 dracofiler.home.lan target[5063]: File
"/usr/lib/python3.4/site-packages/rtslib_fb/target.py", line 245, in
_set_enable
Feb 11 20:20:51 dracofiler.home.lan target[5063]: raise
RTSLibError("Cannot change enable state: %s" % e)
Feb 11 20:20:51 dracofiler.home.lan target[5063]:
rtslib_fb.utils.RTSLibError: Cannot change enable state: [Errno 1]
Operation not permitted
Feb 11 20:20:51 dracofiler.home.lan systemd[1]: target.service: Main
process exited, code=exited, status=1/FAILURE
Feb 11 20:22:21 dracofiler.home.lan systemd[1]: target.service: State
'stop-final-sigterm' timed out. Killing.
Feb 11 20:23:51 dracofiler.home.lan systemd[1]: target.service:
Processes still around after final SIGKILL. Entering failed mode.
Feb 11 20:23:51 dracofiler.home.lan systemd[1]: Failed to start
Restore LIO kernel target configuration.
-- Subject: Unit target.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit target.service has failed.
--
-- The result is failed.
Feb 11 20:23:51 dracofiler.home.lan audit[1]: SERVICE_START pid=1
uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=target comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=faile
Feb 11 20:23:51 dracofiler.home.lan systemd[1]: target.service: Unit
entered failed state.
Feb 11 20:23:51 dracofiler.home.lan systemd[1]: target.service: Failed
with result 'timeout'.
Feb 11 20:23:51 dracofiler.home.lan polkitd[1597]: Unregistered
Authentication Agent for unix-process:5045:16609861 (system bus name
:1.26, object path /org/freedesktop/PolicyKit1/AuthenticationAgent,
locale en_US.UTF-8) (disconnected fr

Thanks again,
Dan
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux