Re: FC target Errors

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Fri, 23 May 2014 11:17:27 -0700

On Fri, 2014-05-23 at 13:37 -0400, deeepdish wrote:
> Thanks for responding.
> 
> > 
> >> QUESTION:   Is there a way to disable SSD emulation using TargetCLI?
> >> I browsed through some of the SCST documentation is there seemed to be
> >> a parameter to disable it, wondering if it's possible to do it within
> >> LIO.   I know there's a way change the drive type on ESXi however
> >> trying to avoid writing my own storage rules.
> > 
> > This bit is controlled in targetcli/rtslib using the device attribute
> > 'is_nonrot'.  By default, this value is taken from what the underlying
> > struct block_device reports to Linux on the target, but it can be
> > explicitly disabled.  However, I'm not aware of a beneficial reason to
> > manually override the default setting from the underlying hardware.
> 
> The reason why I want to disable this at the target level is that 
> although the hardware is detecting my backstore as an SSD, its actually 
> bcache volume consisting of SSD + pool of SATA storage.   So any pass 
> through commands (e.g. TRIM) wouldn't be necessarily supported on a 
> hybrid device like this (ESX picks up this storage as SSD by detaulf, 
> if instance).
> 

Note that UNMAP (eg: the SCSI version of TRIM) is explicitly disabled
(by default) on all target backends with device attribute 'emulate_tpu',
due to the performance issues it introduces on both ESX hosts and most
backend devices that utilize it.  VMWare actually recommends explicitly
disabling it as well.

So disabling the 'is_nonrot' bit mentioned above should not make any
difference wrt to UNMAP/TRIM occuring.

> > Seeing these types of ABORT_TASKs grouped closely together during a
> > session login is normal for tcm_qla2xxx.  However, seeing these occur
> > repeatably over long durations of time can indicate a network
> > connectivity issue, or possibly high latency times from the storage
> > backend servicing I/O requests.
> > 
> > Also note that the target is not blocking I/O to LUNs at this point,
> > typically a ESX host would end up taking the LUNs offline if it detected
> > repeated I/O timeouts and/or LUN resets.
> 
> I will certainly look into the latency aspect a bit further.   I'm 
> wondering if the previous AMD-based box couldn't keep up with the 
> requests, however using the older hardware I did have to reboot the 
> server to regain access to the storage (via LIO).

The way I read the original email was that you had two tcm_qla2xxx
target machines of identical hardware, where only one of which was
encountering repeated ABORT_TASK events, is that correct..?

As mentioned, ESX will take LUNs offline that repeatably hit I/O
timeouts.  Forcing a LIP on the ESX host should have the same effect as
rebooting the target machine. 

> After repeated 
> ABORT_TASKs I noticed ACCESS FOR NON EXISTENT LUN messages come up 
> referencing storage that should have been assigned to the host.

Looking at the original logs, the ACCESS FOR NON EXISTENT LUN messages
occur for LUNs 0x04=0x31.  Are these LUNs actually configured on the
tcm_qla2xxx endpoint..?  What does your targetcli output look like..?

If not, then this is ESX attempting to perform a full sequential LUN
rescan, which is not unusual, and is something that has been observed on
non FC fabrics as well.

> I replatformed this to HP blades and more recent hardware, and will 
> retest going forward.   However these abort tasks are coming in on the 
> HP platform and all I'm running on the initiator end is a Windows 2012 
> VM with ATTO disk benchmarking running continuously.
> 

How frequently are the ABORT_TASKs appearing..?

> Have you see any special considerations for RDM configurations using 
> LIO and ESXi (5.5 in our case)?

Nothing that I'm aware of on the fabric side, although if it's really
not a FC connectivity / HW specific issue, but instead some type of
backend device latency issue, you can try bumping up the SCSI Timeout
value on the ESX host to avoid false positives.

Aside from that, Thomas (CC'ed) knows the most about the best practices
for various ESX storage configurations.

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html