On Thu, 2014-05-22 at 11:08 -0400, deeepdish wrote: > Good day, > > I posted RE: similar issues, LUNs being disconnected from a > LIO/Targetcli based FC Target a few weeks back. I believe the > consensus was that we're running unreliable hardware. I rebuilt our > storage appliance using: > > Fedora 20 - latest updates (Kernel 3.14.3-200.fc20.x86_64) > HP DL490G6 + p711m + QMH2462 dual port 4G HBA ==> 2 x X5570 CPUs & 72GB RAM. > > We have 12 x 4TB volumes in a RAID-6 combined with bcache (2 x mirrored > SSDs) and managed via LVM. > > A few preliminary observations: > > ESXi recognizes any presented backstores disks as SSD: > > [414039.187680] TARGET_CORE[qla2xxx]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. > [414039.193550] TARGET_CORE[qla2xxx]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. > [414139.189729] TARGET_CORE[qla2xxx]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. > [414139.192884] TARGET_CORE[qla2xxx]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. > These are warnings for unsupported ATA_16 passthrough commands. They can be safety ignored. > QUESTION: Is there a way to disable SSD emulation using TargetCLI? > I browsed through some of the SCST documentation is there seemed to be > a parameter to disable it, wondering if it's possible to do it within > LIO. I know there's a way change the drive type on ESXi however > trying to avoid writing my own storage rules. This bit is controlled in targetcli/rtslib using the device attribute 'is_nonrot'. By default, this value is taken from what the underlying struct block_device reports to Linux on the target, but it can be explicitly disabled. However, I'm not aware of a beneficial reason to manually override the default setting from the underlying hardware. > Errors seen using ESXi - Raw disk mapping to Windows 2012: > > [415661.873649] ABORT_TASK: Found referenced qla2xxx task_tag: 1162836 > [415663.207911] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1162836 > [415663.207919] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1177532 > [415663.207924] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1160284 > [415663.207928] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1174540 > [415663.207931] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1177488 > [415663.207935] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1207012 > [415663.207938] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1207056 > [415663.207942] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1134280 > [415663.207945] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1210972 > [415663.207949] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185276 > [415663.207952] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185320 > [415663.207956] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185364 > [415663.207959] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185408 > [415663.207963] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185452 > [415663.207966] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185496 > [415663.207970] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185540 > [415663.207974] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185584 > [415663.207977] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185628 > [415663.207980] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185672 > [415663.207984] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185716 > [415663.207987] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185760 > [415663.207990] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185804 > [415663.207994] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185848 > [415663.207998] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185892 > [415663.208001] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185936 > [415663.208005] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1185980 > [415663.208008] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1186024 > [415663.208012] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1186068 > [415663.208015] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1186112 > [415663.208018] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1132828 > > My experience with these kind of error was LIO would crash (block > access to all LUNs) after repeated abort tasks. > Seeing these types of ABORT_TASKs grouped closely together during a session login is normal for tcm_qla2xxx. However, seeing these occur repeatably over long durations of time can indicate a network connectivity issue, or possibly high latency times from the storage backend servicing I/O requests. Also note that the target is not blocking I/O to LUNs at this point, typically a ESX host would end up taking the LUNs offline if it detected repeated I/O timeouts and/or LUN resets. > > Some other errors we're seeing (FC target) on another host: > > [225802.776243] Detected MISCOMPARE for addr: ffff880207f61000 buf: > ffff880208c47600 > [225802.776254] Target/iblock: Send MISCOMPARE check condition and sense > [239924.911001] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 > [239924.911796] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 > [239924.912579] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 > [239924.913271] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 > [239924.913925] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 > [239924.914581] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 > [239924.985975] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 > [239924.986775] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 > [239924.987458] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 > [239924.988148] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 > [239924.988813] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 > [239924.989501] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 > [239925.011878] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 > [239925.012601] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 > [239925.017737] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 > [239925.018494] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 > [239925.019243] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 > [239925.019927] TARGET_CORE[qla2xxx]: Expected Transfer Length: 493 does not match SCSI CDB Length: 255 for SAM Opcode: 0x12 > [239925.047635] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 > [239925.048301] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 > [239925.048970] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 > [239925.049629] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 > [239925.050254] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 > [239925.050897] TARGET_CORE[qla2xxx]: Expected Transfer Length: 274 does not match SCSI CDB Length: 36 for SAM Opcode: 0x12 These are warning about the mismatched SCSI transfer lengths. They can be safety ignored. > [251555.280475] ABORT_TASK: Found referenced qla2xxx task_tag: 1172032 > [251555.280591] ABORT_TASK: Found referenced qla2xxx task_tag: 1172076 > [251555.998793] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1172076 > [251555.998809] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1173352 > [251555.998816] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1174540 > [251555.998824] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1172032 > [251555.998843] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1174496 > Same note here wrt to repeated ABORT_TASKs as above. > [287197.044501] Detected MISCOMPARE for addr: ffff88020aaae000 buf: ffff8800d43c8a00 > [287197.044523] Target/iblock: Send MISCOMPARE check condition and sense > [287849.872353] Detected MISCOMPARE for addr: ffff880205c63000 buf: ffff88020b64de00 > [287849.872365] Target/iblock: Send MISCOMPARE check condition and sense > [287850.385253] Detected MISCOMPARE for addr: ffff880213fe4000 buf: ffff88020b64f000 > [287850.385263] Target/iblock: Send MISCOMPARE check condition and sense > Warnings related to COMPARE_AND_WRITE (eg: VAAI ATS) failures. These are normal, and can be safety ignored. > Would like to know if we're encountering a condition that can be safely > ignored or is this something else we need to investigate / obtain a bug > fix? > The only ones that I'd be concerned about are ABORT_TASK events that occur consistently over long periods of time, separate from initial session logins. --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html