Re: ESX FC host connectivity issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2016-02-28 at 12:55 -0800, Nicholas A. Bellinger wrote:
> On Sun, 2016-02-28 at 14:13 -0500, Dan Lane wrote:

<SNIP>

> > Unfortunately I'm about to leave town for a few weeks, so I have very
> > little time to look at this.  That said, let's talk about this... I
> > built the latest kernel using linux-next as well as the torvalds build
> > git last night.  Here are the commands I used (in case you see any
> > problems).
> > 
> > git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > git remote add linux-next
> > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> > cd linux
> > git remote add linux-next
> > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> > git fetch linux-next
> > git fetch --tags linux-next
> > cp /boot/config-4.3.4-300.fc23.x86_64 .config
> > make oldconfig
> > make -j8 bzImage; make -j8 modules; make -j8 modules_install; make -j8 install
> > 
> > This resulted in a functioning 4.5rc5+ kernel.  a matter of hours
> > later the storage once again disappeared from my ESXi hosts.  I
> > understand there may be things I need to tweak on my hosts, but should
> > those things cause LIO to stop responding from the target server?
> > It's back to acting the same exact way as before (with the
> > target-pending/4.4-stable from a month ago), I can't stop the service
> > or kill the process.
> > 
> > # uname -a
> > Linux dracofiler.home.lan 4.5.0-rc5+ #2 SMP Sat Feb 27 15:22:25 EST
> > 2016 x86_64 x86_64 x86_64 GNU/Linux
> > 
> 
> You don't need to keep updating the kernel.
> 
> As per the reply to David, you'll need to either explicitly disable ESX
> side ATS heartbeat to avoid this well-known ESX bug that effects every
> target w/ VAAI as per VMWare's own -kb, or set emulate_caw=0 to disable
> AtomicTestandSet all-together:
> 
> http://permalink.gmane.org/gmane.linux.scsi.target.devel/11574
> 
> To repeat, there are no target side changes to avoid this well known ESX
> 5.5u2+ host bug.
> 
> You need to either disable ATS heartbeat on the ESX 5.5u2+ host side, or
> disable COMPARE_AND_WRITE all-together.
> 

To reiterate again from:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2113956

Symptoms:

"An ESXi 5.5 Update 2 or ESXi 6.0 host loses connectivity to a VMFS5
datastore."

"Note: These symptoms are seen in connection with the use of VAAI ATS
heartbeat with storage arrays supplied by several different vendors."

Cause:

"A change in the VMFS heartbeat update method was introduced in ESXi 5.5
Update 2, to help optimize the VMFS heartbeat process. Whereas the
legacy method involves plain SCSI reads and writes with the VMware ESXi
kernel handling validation, the new method offloads the validation step
to the storage system. This is similar to other VAAI-related offloads. 

This optimization results in a significant increase in the volume of ATS
commands the ESXi kernel issues to the storage system and resulting
increased load on the storage system. Under certain circumstances, VMFS
heartbeat using ATS may fail with false ATS miscompare which causes the
ESXi kernel to reverify its access to VMFS datastores. This leads to the
Lost access to datastore messages."

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux