Re: FC target Errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for responding.


QUESTION:   Is there a way to disable SSD emulation using TargetCLI?
I browsed through some of the SCST documentation is there seemed to be
a parameter to disable it, wondering if it's possible to do it within
LIO.   I know there's a way change the drive type on ESXi however
trying to avoid writing my own storage rules.

This bit is controlled in targetcli/rtslib using the device attribute
'is_nonrot'.  By default, this value is taken from what the underlying
struct block_device reports to Linux on the target, but it can be
explicitly disabled.  However, I'm not aware of a beneficial reason to
manually override the default setting from the underlying hardware.

The reason why I want to disable this at the target level is that although the hardware is detecting my backstore as an SSD, its actually bcache volume consisting of SSD + pool of SATA storage. So any pass through commands (e.g. TRIM) wouldn't be necessarily supported on a hybrid device like this (ESX picks up this storage as SSD by detaulf, if instance).

Seeing these types of ABORT_TASKs grouped closely together during a
session login is normal for tcm_qla2xxx.  However, seeing these occur
repeatably over long durations of time can indicate a network
connectivity issue, or possibly high latency times from the storage
backend servicing I/O requests.

Also note that the target is not blocking I/O to LUNs at this point,
typically a ESX host would end up taking the LUNs offline if it detected
repeated I/O timeouts and/or LUN resets.

I will certainly look into the latency aspect a bit further. I'm wondering if the previous AMD-based box couldn't keep up with the requests, however using the older hardware I did have to reboot the server to regain access to the storage (via LIO). After repeated ABORT_TASKs I noticed ACCESS FOR NON EXISTENT LUN messages come up referencing storage that should have been assigned to the host. I replatformed this to HP blades and more recent hardware, and will retest going forward. However these abort tasks are coming in on the HP platform and all I'm running on the initiator end is a Windows 2012 VM with ATTO disk benchmarking running continuously.

Have you see any special considerations for RDM configurations using LIO and ESXi (5.5 in our case)?

On 2014-05-23 01:04:57 +0000, Nicholas A. Bellinger said:

On Thu, 2014-05-22 at 11:08 -0400, deeepdish wrote:
Good day,

I posted RE: similar issues, LUNs being disconnected from a
LIO/Targetcli based FC Target a few weeks back.   I believe the
consensus was that we're running unreliable hardware.   I rebuilt our
storage appliance using:

Fedora 20 - latest updates (Kernel 3.14.3-200.fc20.x86_64)
HP DL490G6 + p711m + QMH2462 dual port 4G HBA ==> 2 x X5570 CPUs & 72GB RAM.

We have 12 x 4TB volumes in a RAID-6 combined with bcache (2 x mirrored
SSDs) and managed via LVM.

A few preliminary observations:

ESXi recognizes any presented backstores disks as SSD:

[414039.187680] TARGET_CORE[qla2xxx]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. [414039.193550] TARGET_CORE[qla2xxx]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. [414139.189729] TARGET_CORE[qla2xxx]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. [414139.192884] TARGET_CORE[qla2xxx]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION.


These are warnings for unsupported ATA_16 passthrough commands.

They can be safety ignored.

QUESTION:   Is there a way to disable SSD emulation using TargetCLI?
I browsed through some of the SCST documentation is there seemed to be
a parameter to disable it, wondering if it's possible to do it within
LIO.   I know there's a way change the drive type on ESXi however
trying to avoid writing my own storage rules.

This bit is controlled in targetcli/rtslib using the device attribute
'is_nonrot'.  By default, this value is taken from what the underlying
struct block_device reports to Linux on the target, but it can be
explicitly disabled.  However, I'm not aware of a beneficial reason to
manually override the default setting from the underlying hardware.

Errors seen using ESXi - Raw disk mapping to Windows 2012:

[415661.873649] ABORT_TASK: Found referenced qla2xxx task_tag: 1162836
[415663.207911] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1162836
[415663.207919] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1177532 [415663.207924] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1160284 [415663.207928] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1174540 [415663.207931] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1177488 [415663.207935] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1207012 [415663.207938] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1207056 [415663.207942] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1134280 [415663.207945] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1210972 [415663.207949] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185276 [415663.207952] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185320 [415663.207956] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185364 [415663.207959] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185408 [415663.207963] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185452 [415663.207966] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185496 [415663.207970] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185540 [415663.207974] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185584 [415663.207977] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185628 [415663.207980] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185672 [415663.207984] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185716 [415663.207987] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185760 [415663.207990] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185804 [415663.207994] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185848 [415663.207998] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185892 [415663.208001] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185936 [415663.208005] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185980 [415663.208008] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1186024 [415663.208012] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1186068 [415663.208015] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1186112 [415663.208018] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1132828

My experience with these kind of error was LIO would crash (block
access to all LUNs) after repeated abort tasks.


Seeing these types of ABORT_TASKs grouped closely together during a
session login is normal for tcm_qla2xxx.  However, seeing these occur
repeatably over long durations of time can indicate a network
connectivity issue, or possibly high latency times from the storage
backend servicing I/O requests.

Also note that the target is not blocking I/O to LUNs at this point,
typically a ESX host would end up taking the LUNs offline if it detected
repeated I/O timeouts and/or LUN resets.


Some other errors we're seeing (FC target) on another host:

[225802.776243] Detected MISCOMPARE for addr: ffff880207f61000 buf:
ffff880208c47600
[225802.776254] Target/iblock: Send MISCOMPARE check condition and sense
[239924.911001] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 [239924.911796] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 [239924.912579] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 [239924.913271] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 [239924.913925] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 [239924.914581] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 [239924.985975] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 [239924.986775] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 [239924.987458] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 [239924.988148] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 [239924.988813] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 [239924.989501] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 [239925.011878] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 [239925.012601] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 [239925.017737] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 [239925.018494] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 [239925.019243] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 [239925.019927] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 [239925.047635] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 [239925.048301] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 [239925.048970] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 [239925.049629] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 [239925.050254] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 [239925.050897] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12

These are warning about the mismatched SCSI transfer lengths.  They can
be safety ignored.

[251555.280475] ABORT_TASK: Found referenced qla2xxx task_tag: 1172032
[251555.280591] ABORT_TASK: Found referenced qla2xxx task_tag: 1172076
[251555.998793] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1172076
[251555.998809] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1173352 [251555.998816] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1174540
[251555.998824] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1172032
[251555.998843] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1174496


Same note here wrt to repeated ABORT_TASKs as above.

[287197.044501] Detected MISCOMPARE for addr: ffff88020aaae000 buf: ffff8800d43c8a00
[287197.044523] Target/iblock: Send MISCOMPARE check condition and sense
[287849.872353] Detected MISCOMPARE for addr: ffff880205c63000 buf: ffff88020b64de00
[287849.872365] Target/iblock: Send MISCOMPARE check condition and sense
[287850.385253] Detected MISCOMPARE for addr: ffff880213fe4000 buf: ffff88020b64f000
[287850.385263] Target/iblock: Send MISCOMPARE check condition and sense


Warnings related to COMPARE_AND_WRITE (eg: VAAI ATS) failures.  These
are normal, and can be safety ignored.

Would like to know if we're encountering a condition that can be safely
ignored or is this something else we need to investigate / obtain a bug
fix?


The only ones that I'd be concerned about are ABORT_TASK events that
occur consistently over long periods of time, separate from initial
session logins.

--nab



--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux