Re: FC target Errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2014-05-22 at 11:08 -0400, deeepdish wrote:
> Good day,
> 
> I posted RE: similar issues, LUNs being disconnected from a 
> LIO/Targetcli based FC Target a few weeks back.   I believe the 
> consensus was that we're running unreliable hardware.   I rebuilt our 
> storage appliance using:
> 
> Fedora 20 - latest updates (Kernel 3.14.3-200.fc20.x86_64)
> HP DL490G6 + p711m + QMH2462 dual port 4G HBA ==> 2 x X5570 CPUs & 72GB RAM.
> 
> We have 12 x 4TB volumes in a RAID-6 combined with bcache (2 x mirrored 
> SSDs) and managed via LVM.
> 
> A few preliminary observations:
> 
> ESXi recognizes any presented backstores disks as SSD:
> 
> [414039.187680] TARGET_CORE[qla2xxx]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION.
> [414039.193550] TARGET_CORE[qla2xxx]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION.
> [414139.189729] TARGET_CORE[qla2xxx]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION.
> [414139.192884] TARGET_CORE[qla2xxx]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION.
> 

These are warnings for unsupported ATA_16 passthrough commands.

They can be safety ignored.

> QUESTION:   Is there a way to disable SSD emulation using TargetCLI?   
> I browsed through some of the SCST documentation is there seemed to be 
> a parameter to disable it, wondering if it's possible to do it within 
> LIO.   I know there's a way change the drive type on ESXi however 
> trying to avoid writing my own storage rules.

This bit is controlled in targetcli/rtslib using the device attribute
'is_nonrot'.  By default, this value is taken from what the underlying
struct block_device reports to Linux on the target, but it can be
explicitly disabled.  However, I'm not aware of a beneficial reason to
manually override the default setting from the underlying hardware.

> Errors seen using ESXi - Raw disk mapping to Windows 2012:
> 
> [415661.873649] ABORT_TASK: Found referenced qla2xxx task_tag: 1162836
> [415663.207911] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1162836
> [415663.207919] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1177532
> [415663.207924] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1160284
> [415663.207928] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1174540
> [415663.207931] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1177488
> [415663.207935] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1207012
> [415663.207938] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1207056
> [415663.207942] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1134280
> [415663.207945] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1210972
> [415663.207949] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185276
> [415663.207952] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185320
> [415663.207956] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185364
> [415663.207959] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185408
> [415663.207963] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185452
> [415663.207966] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185496
> [415663.207970] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185540
> [415663.207974] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185584
> [415663.207977] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185628
> [415663.207980] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185672
> [415663.207984] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185716
> [415663.207987] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185760
> [415663.207990] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185804
> [415663.207994] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185848
> [415663.207998] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185892
> [415663.208001] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185936
> [415663.208005] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185980
> [415663.208008] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1186024
> [415663.208012] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1186068
> [415663.208015] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1186112
> [415663.208018] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1132828
> 
> My experience with these kind of error was LIO would crash (block 
> access to all LUNs) after repeated abort tasks.   
> 

Seeing these types of ABORT_TASKs grouped closely together during a
session login is normal for tcm_qla2xxx.  However, seeing these occur
repeatably over long durations of time can indicate a network
connectivity issue, or possibly high latency times from the storage
backend servicing I/O requests.

Also note that the target is not blocking I/O to LUNs at this point,
typically a ESX host would end up taking the LUNs offline if it detected
repeated I/O timeouts and/or LUN resets.

> 
> Some other errors we're seeing (FC target) on another host:
> 
> [225802.776243] Detected MISCOMPARE for addr: ffff880207f61000 buf: 
> ffff880208c47600
> [225802.776254] Target/iblock: Send MISCOMPARE check condition and sense
> [239924.911001] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12
> [239924.911796] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12
> [239924.912579] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12
> [239924.913271] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12
> [239924.913925] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12
> [239924.914581] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12
> [239924.985975] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12
> [239924.986775] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12
> [239924.987458] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12
> [239924.988148] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12
> [239924.988813] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12
> [239924.989501] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12
> [239925.011878] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12
> [239925.012601] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12
> [239925.017737] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12
> [239925.018494] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12
> [239925.019243] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12
> [239925.019927] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12
> [239925.047635] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12
> [239925.048301] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12
> [239925.048970] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12
> [239925.049629] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12
> [239925.050254] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12
> [239925.050897] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12

These are warning about the mismatched SCSI transfer lengths.  They can
be safety ignored.

> [251555.280475] ABORT_TASK: Found referenced qla2xxx task_tag: 1172032
> [251555.280591] ABORT_TASK: Found referenced qla2xxx task_tag: 1172076
> [251555.998793] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1172076
> [251555.998809] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1173352
> [251555.998816] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1174540
> [251555.998824] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1172032
> [251555.998843] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1174496
> 

Same note here wrt to repeated ABORT_TASKs as above.

> [287197.044501] Detected MISCOMPARE for addr: ffff88020aaae000 buf: ffff8800d43c8a00
> [287197.044523] Target/iblock: Send MISCOMPARE check condition and sense
> [287849.872353] Detected MISCOMPARE for addr: ffff880205c63000 buf: ffff88020b64de00
> [287849.872365] Target/iblock: Send MISCOMPARE check condition and sense
> [287850.385253] Detected MISCOMPARE for addr: ffff880213fe4000 buf: ffff88020b64f000
> [287850.385263] Target/iblock: Send MISCOMPARE check condition and sense
> 

Warnings related to COMPARE_AND_WRITE (eg: VAAI ATS) failures.  These
are normal, and can be safety ignored.

> Would like to know if we're encountering a condition that can be safely 
> ignored or is this something else we need to investigate / obtain a bug 
> fix?
> 

The only ones that I'd be concerned about are ABORT_TASK events that
occur consistently over long periods of time, separate from initial
session logins.

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux