Also starting to see these messages now: May 20 13:45:51 roc-4r-scd212 kernel: [65824.744667] enhanceio: Allocate 30398KB (8B per) mem for 3890944-entry cache (capacity:15258MB, associativity:256, block size:4096 bytes)enhanceio_lru: Initialized 15199 sets in LRU May 20 13:45:51 roc-4r-scd212 kernel: [65826.767511] Detected MISCOMPARE for addr: ffff88084667e000 buf: ffff88105b7e7600 May 20 13:45:51 roc-4r-scd212 kernel: [65826.767516] Target/iblock: Send MISCOMPARE check condition and sense May 20 13:46:48 roc-4r-scd212 kernel: [65884.387119] libceph: mon2 10.80.3.236:6789 socket closed (con state OPEN) May 20 13:46:48 roc-4r-scd212 kernel: [65884.387138] libceph: mon2 10.80.3.236:6789 session lost, hunting for new mon On Tue, May 19, 2015 at 8:31 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: > Hello, > > On Fri, May 8, 2015 at 1:45 AM, Nicholas A. Bellinger > <nab@xxxxxxxxxxxxxxx> wrote: >> Hi Robert, >> >> On Wed, 2015-05-06 at 12:21 -0400, Robert Wood wrote: >>> One update: it appears that Vmware ESXi 5.5 U2 is sometimes >>> incorrectly sensing that the LIO-ORG device is an SSD. I wonder if >>> that is causing issues with commands being sent? I am testing >>> untagging all LIO-ORG devices as SSD to see if the problem recurs. >>> >>> >> >> Changing around SCSI target feature bits reported by LIO to ESX hosts is >> only going to mask the real issue. >> >> From Mike's testing earlier in the week, he's been able to verify with >> FC and iSCSI that it's not a target specific bug, but AFAICT it's still >> not clear if something specific to iscsi-target may be triggering a bug >> in ceph client. >> >> AFAICT, the bug is centered around ceph client reconnect logic + >> read-copy-update (RCU) usage in v3.19.y code. (Adding Sage CC') > > We continue to get crashes about once weekly, at this point ESXi is > all set to defaults and there are no ceph warnings of any kind, nor > any others - we get this in syslog and vmkwarning.log in ESXi: > > syslog: > > > May 18 12:33:47 roc-4r-scd212 kernel: [658265.920803] > TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending > CHECK_CONDITION. > May 18 12:33:47 roc-4r-scd212 kernel: [658265.921082] > TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending > CHECK_CONDITION. > May 18 12:33:47 roc-4r-scd212 kernel: [658265.979386] > TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending > CHECK_CONDITION. > May 18 12:33:47 roc-4r-scd212 kernel: [658265.979723] > TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending > CHECK_CONDITION. > May 18 12:33:49 roc-4r-scd212 kernel: [658267.940356] ABORT_TASK: > Found referenced iSCSI task_tag: 205520896 > May 18 12:33:49 roc-4r-scd212 kernel: [658267.940359] ABORT_TASK: > ref_tag: 205520896 already complete, skipping > May 18 12:33:49 roc-4r-scd212 kernel: [658267.940361] ABORT_TASK: > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 205520896 > May 18 12:33:49 roc-4r-scd212 kernel: [658267.940371] Unexpected ret: > -32 send data 48 > May 18 12:33:50 roc-4r-scd212 kernel: [658268.637907] ABORT_TASK: > Found referenced iSCSI task_tag: 809500672 > May 18 12:33:50 roc-4r-scd212 kernel: [658268.637913] ABORT_TASK: > Sending TMR_FUNCTION_COMPLETE for ref_tag: 809500672 > May 18 12:33:59 roc-4r-scd212 kernel: [658277.985599] ABORT_TASK: > Found referenced iSCSI task_tag: 2438871 > May 18 12:33:59 roc-4r-scd212 kernel: [658277.985606] ABORT_TASK: > Sending TMR_FUNCTION_COMPLETE for ref_tag: 2438871 > May 18 12:33:59 roc-4r-scd212 kernel: [658277.985610] ABORT_TASK: > Found referenced iSCSI task_tag: 2438873 > May 18 12:33:59 roc-4r-scd212 kernel: [658277.985612] ABORT_TASK: > ref_tag: 2438873 already complete, skipping > May 18 12:33:59 roc-4r-scd212 kernel: [658277.985613] ABORT_TASK: > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 2438873 > May 18 12:33:59 roc-4r-scd212 kernel: [658277.985620] Unable to locate > ITT: 0x002536d7 on CID: 0 > May 18 12:33:59 roc-4r-scd212 kernel: [658277.985632] Unable to locate > RefTaskTag: 0x002536d7 on CID: 0. > May 18 12:34:03 roc-4r-scd212 kernel: [658281.641021] Unexpected ret: > -32 send data 48 > May 18 12:34:16 roc-4r-scd212 kernel: [658295.276146] Unexpected ret: > -32 send data 48 > May 18 12:34:44 roc-4r-scd212 kernel: [658322.548137] Unexpected ret: > -32 send data 48 > May 18 12:34:57 roc-4r-scd212 kernel: [658336.279539] ABORT_TASK: > Found referenced iSCSI task_tag: 1308622848 > May 18 12:34:57 roc-4r-scd212 kernel: [658336.279544] ABORT_TASK: > ref_tag: 1308622848 already complete, skipping > May 18 12:34:57 roc-4r-scd212 kernel: [658336.279546] ABORT_TASK: > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1308622848 > May 18 12:34:58 roc-4r-scd212 kernel: [658336.944859] ABORT_TASK: > Found referenced iSCSI task_tag: 1862270976 > May 18 12:34:58 roc-4r-scd212 kernel: [658336.944862] ABORT_TASK: > ref_tag: 1862270976 already complete, skipping > May 18 12:34:58 roc-4r-scd212 kernel: [658336.944863] ABORT_TASK: > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1862270976 > May 18 12:35:01 roc-4r-scd212 CRON[963082]: (root) CMD (command -v > debian-sa1 > /dev/null && debian-sa1 1 1) > May 18 12:35:11 roc-4r-scd212 kernel: [658349.906624] Unexpected ret: > -32 send data 48 > May 18 12:35:25 roc-4r-scd212 kernel: [658363.528498] Unexpected ret: > -32 send data 48 > May 18 12:35:38 roc-4r-scd212 kernel: [658376.947982] ABORT_TASK: > Found referenced iSCSI task_tag: 1732247552 > May 18 12:35:38 roc-4r-scd212 kernel: [658376.947990] ABORT_TASK: > Sending TMR_FUNCTION_COMPLETE for ref_tag: 1732247552 > May 18 12:35:38 roc-4r-scd212 kernel: [658377.155014] Unexpected ret: > -32 send data 48 > May 18 12:35:48 roc-4r-scd212 kernel: [658386.840043] > TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000016 > May 18 12:35:52 roc-4r-scd212 kernel: [658390.898777] ABORT_TASK: > Found referenced iSCSI task_tag: 603979776 > May 18 12:35:52 roc-4r-scd212 kernel: [658390.898780] Unexpected ret: > -32 send data 48 > May 18 12:35:52 roc-4r-scd212 kernel: [658390.898807] ABORT_TASK: > ref_tag: 603979776 already complete, skipping > May 18 12:35:52 roc-4r-scd212 kernel: [658390.898808] ABORT_TASK: > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 603979776 > May 18 12:36:01 roc-4r-scd212 kernel: [658400.446803] > TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000020 > May 18 12:36:01 roc-4r-scd212 kernel: [658400.446875] ABORT_TASK: > Found referenced iSCSI task_tag: 2438925 > May 18 12:36:01 roc-4r-scd212 kernel: [658400.446878] ABORT_TASK: > ref_tag: 2438925 already complete, skipping > May 18 12:36:01 roc-4r-scd212 kernel: [658400.446880] ABORT_TASK: > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 2438925 > May 18 12:36:06 roc-4r-scd212 kernel: [658404.612435] Unexpected ret: > -32 send data 48 > May 18 12:36:15 roc-4r-scd212 kernel: [658414.060103] > TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000002a > May 18 12:36:15 roc-4r-scd212 kernel: [658414.060175] ABORT_TASK: > Found referenced iSCSI task_tag: 2438937 > May 18 12:36:15 roc-4r-scd212 kernel: [658414.060178] ABORT_TASK: > ref_tag: 2438937 already complete, skipping > May 18 12:36:15 roc-4r-scd212 kernel: [658414.060180] ABORT_TASK: > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 2438937 > May 18 12:36:19 roc-4r-scd212 kernel: [658418.384607] ABORT_TASK: > Found referenced iSCSI task_tag: 1627389952 > May 18 12:36:19 roc-4r-scd212 kernel: [658418.384612] Unexpected ret: > -32 send data 48 > May 18 12:36:19 roc-4r-scd212 kernel: [658418.384640] ABORT_TASK: > ref_tag: 1627389952 already complete, skipping > May 18 12:36:19 roc-4r-scd212 kernel: [658418.384641] ABORT_TASK: > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1627389952 > May 18 12:36:22 roc-4r-scd212 kernel: [658420.844106] ABORT_TASK: > Found referenced iSCSI task_tag: 943718400 > May 18 12:36:22 roc-4r-scd212 kernel: [658420.844112] ABORT_TASK: > Sending TMR_FUNCTION_COMPLETE for ref_tag: 943718400 > May 18 12:36:29 roc-4r-scd212 kernel: [658427.669519] > TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000034 > May 18 12:36:33 roc-4r-scd212 kernel: [658432.016990] Unexpected ret: > -32 send data 48 > May 18 12:36:39 roc-4r-scd212 kernel: [658438.347261] ABORT_TASK: > Found referenced iSCSI task_tag: 2134900736 > May 18 12:36:39 roc-4r-scd212 kernel: [658438.347268] ABORT_TASK: > Sending TMR_FUNCTION_COMPLETE for ref_tag: 2134900736 > > > > ESXi (please note the time is in UTC here): > > 2015-05-18T18:50:15.590Z cpu38:1705984)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405a37e54d400000000000000000" state in doubt; requested fast > p > 2015-05-18T18:50:19.254Z cpu38:1705984)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405a37e54d400000000000000000" state in doubt; requested fast > p > 2015-05-18T18:52:26.959Z cpu38:1705984)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405a37e54d400000000000000000" state in doubt; requested fast > p > 2015-05-18T18:53:22.275Z cpu38:1705984)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405a37e54d400000000000000000" state in doubt; requested fast > p > 2015-05-18T18:53:25.943Z cpu38:1705984)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405a37e54d400000000000000000" state in doubt; requested fast > p > 2015-05-18T18:55:31.015Z cpu38:1705984)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405a37e54d400000000000000000" state in doubt; requested fast > p > 2015-05-18T18:55:34.614Z cpu38:1705984)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405a37e54d400000000000000000" state in doubt; requested fast > p > 2015-05-18T19:09:29.311Z cpu16:33166)WARNING: ScsiDeviceIO: 1223: > Device naa.6001405a37e54d400000000000000000 performance has > deteriorated. I/O latency increased from averag > 2015-05-18T19:09:30.360Z cpu14:32843)WARNING: ScsiDeviceIO: 1223: > Device naa.6001405a37e54d400000000000000000 performance has > deteriorated. I/O latency increased from averag > 2015-05-18T21:26:07.605Z cpu38:1705984)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast > p > 2015-05-18T21:26:27.904Z cpu9:33716)WARNING: iscsi_vmk: > iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task" > with itt=0x10d4450 (refITT=0x10d444f) timed ou > 2015-05-18T21:26:27.904Z cpu27:1706465)WARNING: ScsiPath: 6292: Set > retry timeout for failed TaskMgmt abort for CmdSN 0x0, status > Failure, path vmhba33:C3:T22:L1 > 2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk: > iscsivmk_StopConnection: vmhba33:CH:3 T:22 CN:0: iSCSI connection is > being marked "OFFLINE" (Event:4) > 2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk: > iscsivmk_StopConnection: Sess [ISID: 00023d000004 TARGET: > iqn.2015-03.com.iss-integration:rbd-roc:tgt1 TPGT: 1 TSIH: > 2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk: > iscsivmk_StopConnection: Conn [CID: 0 L: 10.70.2.180:39224 R: > 10.70.2.211:3260] > 2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk: > iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task" > with itt=0x10d445a (refITT=0x10d444e) timed ou > 2015-05-18T21:26:33.616Z cpu38:1705984)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast > p > 2015-05-18T21:26:33.616Z cpu25:33722)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405a37e54d400000000000000000" state in doubt; requested fast > pat > 2015-05-18T21:26:33.904Z cpu5:32889)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast > path > 2015-05-18T21:26:34.917Z cpu5:1762870)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast > pa > 2015-05-18T21:26:35.478Z cpu24:32853)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405a37e54d400000000000000000" state in doubt; requested fast > pat > 2015-05-18T21:26:35.912Z cpu5:1762870)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP device > "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast > pa > 2015-05-18T21:26:36.643Z cpu9:33716)WARNING: iscsi_vmk: > iscsivmk_StartConnection: vmhba33:CH:3 T:22 CN:0: iSCSI connection is > being marked "ONLINE" > 2015-05-18T21:26:36.643Z cpu9:33716)WARNING: iscsi_vmk: > iscsivmk_StartConnection: Sess [ISID: 00023d000004 TARGET: > iqn.2015-03.com.iss-integration:rbd-roc:tgt1 TPGT: 1 TSIH: > 2015-05-18T21:26:36.643Z cpu9:33716)WARNING: iscsi_vmk: > iscsivmk_StartConnection: Conn [CID: 0 L: 10.70.2.180:12484 R: > 10.70.2.211:3260] > 2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk: > iscsivmk_StopConnection: vmhba33:CH:3 T:22 CN:0: iSCSI connection is > being marked "OFFLINE" (Event:4) > 2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk: > iscsivmk_StopConnection: Sess [ISID: 00023d000004 TARGET: > iqn.2015-03.com.iss-integration:rbd-roc:tgt1 TPGT: 1 TSIH: > 2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk: > iscsivmk_StopConnection: Conn [CID: 0 L: 10.70.2.180:12484 R: > 10.70.2.211:3260] > 2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk: > iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task" > with itt=0x10d445f (refITT=0x10d445e) timed ou > 2015-05-18T21:26:47.222Z cpu26:33722)WARNING: NMP: > nmp_DeviceRequestFastDeviceProbe:237: NMP > > >> >>> >>> On Wed, May 6, 2015 at 11:17 AM, Robert Wood <rwood@xxxxxxxxxxxxxxxxxxx> wrote: >>> > Good morning, we are continuing to receive: >>> > >>> > May 6 11:08:26 roc-4r-scd212 kernel: [71898.566185] ABORT_TASK: >>> > ref_tag: 3847728 already complete, skipping >>> > May 6 11:08:26 roc-4r-scd212 kernel: [71898.566187] ABORT_TASK: >>> > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 3847728 >>> > May 6 11:08:26 roc-4r-scd212 kernel: [71898.566191] Unable to locate >>> > ITT: 0x003ab632 on CID: 0 >>> > May 6 11:08:26 roc-4r-scd212 kernel: [71898.566191] Unable to locate >>> > RefTaskTag: 0x003ab632 on CID: 0. >>> > May 6 11:08:26 roc-4r-scd212 kernel: [71898.566254] Unexpected ret: >>> > -32 send data 48 >>> > May 6 11:08:30 roc-4r-scd212 kernel: [71902.397053] libceph: osd20 >>> > 10.80.3.25:6812 socket closed (con state OPEN) >>> -- >> >> Thanks for reporting! >> >> --nab >> -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html