Re: ESX FC host connectivity issues

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Fri, 04 Mar 2016 22:01:01 -0800

On Fri, 2016-03-04 at 01:49 -0500, Dan Lane wrote:
> On Sun, Feb 28, 2016 at 4:02 PM, Nicholas A. Bellinger
> <nab@xxxxxxxxxxxxxxx> wrote:
> > On Sun, 2016-02-28 at 12:55 -0800, Nicholas A. Bellinger wrote:

<SNIP>

> > To reiterate again from:
> >
> > https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2113956
> >
> > Symptoms:
> >
> > "An ESXi 5.5 Update 2 or ESXi 6.0 host loses connectivity to a VMFS5
> > datastore."
> >
> > "Note: These symptoms are seen in connection with the use of VAAI ATS
> > heartbeat with storage arrays supplied by several different vendors."
> >
> > Cause:
> >
> > "A change in the VMFS heartbeat update method was introduced in ESXi 5.5
> > Update 2, to help optimize the VMFS heartbeat process. Whereas the
> > legacy method involves plain SCSI reads and writes with the VMware ESXi
> > kernel handling validation, the new method offloads the validation step
> > to the storage system. This is similar to other VAAI-related offloads.
> >
> > This optimization results in a significant increase in the volume of ATS
> > commands the ESXi kernel issues to the storage system and resulting
> > increased load on the storage system. Under certain circumstances, VMFS
> > heartbeat using ATS may fail with false ATS miscompare which causes the
> > ESXi kernel to reverify its access to VMFS datastores. This leads to the
> > Lost access to datastore messages."
> >
> 
> Nicholas: The problem isn't with the ATS "bug", in fact I don't have
> any mention of ATS anywhere in my vmkernel.log
> 

Yes, you are most certainly hitting ATS heartbeat timeouts.

There is a reason why every major storage vendor says you MUST disable
ATS heartbeat with all VMFS5 mounts with ESX v5.5u2+, or completely
disable COMPARE_AND_WRITE all-together at target level.

In a perfect world it would be nice to not have to make ESX host
changes, but we don't live in a perfect world!

> [root@labhost4:/tmp/scratch/log] grep ATS vmkernel.log
> [root@labhost4:/tmp/scratch/log] zcat vmkernel.0.gz | grep ATS
> [root@labhost4:/tmp/scratch/log] zcat vmkernel.1.gz | grep ATS
> [root@labhost4:/tmp/scratch/log] zcat vmkernel.2.gz | grep ATS
> [root@labhost4:/tmp/scratch/log] zcat vmkernel.3.gz | grep ATS
> [root@labhost4:/tmp/scratch/log] zcat vmkernel.4.gz | grep ATS
> [root@labhost4:/tmp/scratch/log]
> 

That is not how you check for ATS heartbeats.  See below.

> Also, my friend David did disable ATS on his target server and the
> crash still occurred.
>  I just got home a couple of hours ago so I
> haven't had a chance, but the above tells me that the problem is not
> related to ATS.  Also, during this testing I only had one ESXi host
> turned on, which is where the logs are from.
> 
> I just restarted the target server, and with pretty much zero load on
> the server I got this in messages on the target server:
> [  275.145225] ABORT_TASK: Found referenced qla2xxx task_tag: 1184312
> [  275.145274] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1184312
> [  312.412465] ABORT_TASK: Found referenced qla2xxx task_tag: 1176128
> [  312.412511] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1176128
> [  313.413499] ABORT_TASK: Found referenced qla2xxx task_tag: 1219556
> [  318.729670] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1219556
> [  318.730244] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1194652
> [  318.730737] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1196720
> [  318.731215] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1217708
> [  318.731658] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1218896
> [  318.732111] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1182024
> [  318.732531] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1168032
> [  327.528277] ABORT_TASK: Found referenced qla2xxx task_tag: 1139300
> 
> See the attachment for the vmkernel.log from the same exact time
> period, it was too big for here.

ATS is using SCSI command COMPARE_AND_WRITE = 0x89:

# cat vmkernel-snip.log | grep 0x89
2016-03-04T06:29:01.597Z cpu15:33312)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x89 (0x43a185bfa5c0, 32790) to dev "naa.6001405a5ce03d8529a4341a454252a8" on path "vmhba1:C0:T1:L0" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0. Act:EVAL
2016-03-04T06:29:01.597Z cpu15:33312)ScsiDeviceIO: 2645: Cmd(0x43a185bfa5c0) 0x89, CmdSN 0x21f93 from world 32790 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2016-03-04T06:29:02.597Z cpu14:33312)ScsiDeviceIO: 2645: Cmd(0x43a186506240) 0x89, CmdSN 0x21f95 from world 32790 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2016-03-04T06:29:16.711Z cpu8:33312)ScsiDeviceIO: 2645: Cmd(0x439d8034b200) 0x89, CmdSN 0x21f97 from world 32825 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0.
2016-03-04T06:29:29.744Z cpu8:33312)ScsiDeviceIO: 2645: Cmd(0x439d8550b3c0) 0x89, CmdSN 0x21f99 from world 32826 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0.
2016-03-04T06:29:42.775Z cpu8:33312)ScsiDeviceIO: 2645: Cmd(0x43a180333700) 0x89, CmdSN 0x21f9b from world 32827 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2016-03-04T06:29:55.789Z cpu8:33312)ScsiDeviceIO: 2645: Cmd(0x439d87290fc0) 0x89, CmdSN 0x21f9d from world 32825 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2016-03-04T06:30:08.794Z cpu8:33312)ScsiDeviceIO: 2645: Cmd(0x439d8036eb80) 0x89, CmdSN 0x21f9f from world 32826 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0.
2016-03-04T06:30:10.229Z cpu8:33312)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x89 (0x43a186f55480, 32827) to dev "naa.6001405a5ce03d8529a4341a454252a8" on path "vmhba1:C0:T1:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0. Act:EVAL
2016-03-04T06:30:10.229Z cpu8:33312)ScsiDeviceIO: 2607: Cmd(0x43a186f55480) 0x89, CmdSN 0x21fa1 from world 32827 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2016-03-04T06:30:10.243Z cpu8:33312)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x89 (0x43a186f55480, 32827) to dev "naa.6001405a5ce03d8529a4341a454252a8" on path "vmhba1:C0:T1:L0" Failed: H:0xc D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0. Act:NONE
2016-03-04T06:30:10.243Z cpu8:33312)ScsiDeviceIO: 2607: Cmd(0x43a186f55480) 0x89, CmdSN 0x21fa1 from world 32827 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0xc D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2016-03-04T06:30:19.854Z cpu10:33312)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x89 (0x43a186f55480, 32827) to dev "naa.6001405a5ce03d8529a4341a454252a8" on path "vmhba1:C0:T1:L0" Failed: H:0x1 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0. Act:FAILOVER
2016-03-04T06:30:22.029Z cpu15:33313)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x89 (0x43a186f55480) to dev "naa.6001405a5ce03d8529a4341a454252a8" failed on path "vmhba2:C0:T1:L0" H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2016-03-04T06:30:22.029Z cpu15:33313)ScsiDeviceIO: 2645: Cmd(0x43a186f55480) 0x89, CmdSN 0x21fa1 from world 32827 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2016-03-04T06:30:35.032Z cpu15:33313)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x89 (0x439d85181d40, 32825) to dev "naa.6001405a5ce03d8529a4341a454252a8" on path "vmhba2:C0:T1:L0" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0. Act:EVAL
2016-03-04T06:30:35.032Z cpu15:33313)ScsiDeviceIO: 2645: Cmd(0x439d85181d40) 0x89, CmdSN 0x21fa3 from world 32825 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0.
2016-03-04T06:30:48.237Z cpu15:33313)ScsiDeviceIO: 2645: Cmd(0x439d855c33c0) 0x89, CmdSN 0x21fa5 from world 32826 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2016-03-04T06:31:01.442Z cpu15:33313)ScsiDeviceIO: 2645: Cmd(0x43a186a693c0) 0x89, CmdSN 0x21fa7 from world 32827 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2016-03-04T06:31:02.468Z cpu15:33313)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x89 (0x439d85181d40, 32825) to dev "naa.6001405a5ce03d8529a4341a454252a8" on path "vmhba2:C0:T1:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0. Act:EVAL
2016-03-04T06:31:02.468Z cpu15:33313)ScsiDeviceIO: 2607: Cmd(0x439d85181d40) 0x89, CmdSN 0x21fa9 from world 32825 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0.
2016-03-04T06:31:02.483Z cpu15:33313)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x89 (0x439d85181d40, 32825) to dev "naa.6001405a5ce03d8529a4341a454252a8" on path "vmhba2:C0:T1:L0" Failed: H:0xc D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0. Act:NONE
2016-03-04T06:31:02.483Z cpu15:33313)ScsiDeviceIO: 2607: Cmd(0x439d85181d40) 0x89, CmdSN 0x21fa9 from world 32825 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0xc D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0.
2016-03-04T06:31:14.444Z cpu6:33061)ScsiDeviceIO: 2645: Cmd(0x439d85181d40) 0x89, CmdSN 0x21fa9 from world 32825 to dev "naa.6001405a5ce03d8529a4341a454252a8" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0.

And yes, the ATS heartbeat timeout corresponds to internal
COMPARE_AND_WRITE failures:

# cat vmkernel-snip.log | grep " HB"
2016-03-04T06:27:59.950Z cpu11:32827)HBX: 276: 'dracofiler-lun0': HB at offset 3956736 - Reclaimed heartbeat [Timeout]:
2016-03-04T06:30:21.781Z cpu2:33924)HBX: 2801: 'dracofiler-lun0': HB at offset 3956736 - Waiting for timed out HB:
2016-03-04T06:30:22.029Z cpu2:33924)HBX: 2801: 'dracofiler-lun0': HB at offset 3956736 - Waiting for timed out HB:
2016-03-04T06:30:35.234Z cpu2:33924)HBX: 2801: 'dracofiler-lun0': HB at offset 3956736 - Waiting for timed out HB:
2016-03-04T06:30:48.439Z cpu2:33924)HBX: 2801: 'dracofiler-lun0': HB at offset 3956736 - Waiting for timed out HB:
2016-03-04T06:31:01.442Z cpu2:33924)HBX: 2801: 'dracofiler-lun0': HB at offset 3956736 - Waiting for timed out HB:
2016-03-04T06:31:14.444Z cpu2:33924)HBX: 2801: 'dracofiler-lun0': HB at offset 3956736 - Waiting for timed out HB:

The sense ASQ codes above for failed COMPARE_AND_WRITE = 0x89 commands are:

   0x20 = /* INVALID COMMAND OPERATION CODE */
   0x24 = /* INVALID FIELD IN CDB */

which confirms ATS commands are being internally aborted on your ESX
v5.5u2+ host side config, resulting in repeated ATS heartbeat timeouts
until ESX takes the device offline.

> 
> This is the point where I can no longer control the service on the
> target, running service target stop results in the aforementioned hung
> task, here is the output of "cat /proc/$PID/stack" after I try
> stopping the hung task:
> [root@dracofiler ~]# cat /proc/1911/stack
> [<ffffffffa053c0ee>] tcm_qla2xxx_tpg_enable_store+0xde/0x1a0 [tcm_qla2xxx]
> [<ffffffff812b8b7a>] configfs_write_file+0x9a/0x100
> [<ffffffff81234967>] __vfs_write+0x37/0x120
> [<ffffffff81235289>] vfs_write+0xa9/0x1a0
> [<ffffffff812361b5>] SyS_write+0x55/0xc0
> [<ffffffff817aa56e>] entry_SYSCALL_64_fastpath+0x12/0x71
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> It's been almost a week since I worked on this, so please forgive me
> if I missed one of your suggestions for something to try or request
> for information that I missed.  Just let me know what it is and I'll
> do it.

To reiterate, the hung task shutdown bug you are triggering with Fedora
-fb is separate from the ESX ATS heartbeat issues detailed above.

So first things first for ESX.  Let's finally see how ATS w/o ATS
heartbeat works for your VMFS5 mounts to avoid generating constant ATS
heartbeat timeouts + ABORT_TASKs.

Otherwise you need to explicitly set emulate_caw=0 to enforce pre-ATS
operation for all backend devices in order to get a working ESX v5.5u2+
environment, if you can't change ESX host configs.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html