Hello, On Fri, May 8, 2015 at 1:45 AM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > Hi Robert, > > On Wed, 2015-05-06 at 12:21 -0400, Robert Wood wrote: >> One update: it appears that Vmware ESXi 5.5 U2 is sometimes >> incorrectly sensing that the LIO-ORG device is an SSD. I wonder if >> that is causing issues with commands being sent? I am testing >> untagging all LIO-ORG devices as SSD to see if the problem recurs. >> >> > > Changing around SCSI target feature bits reported by LIO to ESX hosts is > only going to mask the real issue. > > From Mike's testing earlier in the week, he's been able to verify with > FC and iSCSI that it's not a target specific bug, but AFAICT it's still > not clear if something specific to iscsi-target may be triggering a bug > in ceph client. > > AFAICT, the bug is centered around ceph client reconnect logic + > read-copy-update (RCU) usage in v3.19.y code. (Adding Sage CC') We continue to get crashes about once weekly, at this point ESXi is all set to defaults and there are no ceph warnings of any kind, nor any others - we get this in syslog and vmkwarning.log in ESXi: syslog: May 18 12:33:47 roc-4r-scd212 kernel: [658265.920803] TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. May 18 12:33:47 roc-4r-scd212 kernel: [658265.921082] TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. May 18 12:33:47 roc-4r-scd212 kernel: [658265.979386] TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. May 18 12:33:47 roc-4r-scd212 kernel: [658265.979723] TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. May 18 12:33:49 roc-4r-scd212 kernel: [658267.940356] ABORT_TASK: Found referenced iSCSI task_tag: 205520896 May 18 12:33:49 roc-4r-scd212 kernel: [658267.940359] ABORT_TASK: ref_tag: 205520896 already complete, skipping May 18 12:33:49 roc-4r-scd212 kernel: [658267.940361] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 205520896 May 18 12:33:49 roc-4r-scd212 kernel: [658267.940371] Unexpected ret: -32 send data 48 May 18 12:33:50 roc-4r-scd212 kernel: [658268.637907] ABORT_TASK: Found referenced iSCSI task_tag: 809500672 May 18 12:33:50 roc-4r-scd212 kernel: [658268.637913] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 809500672 May 18 12:33:59 roc-4r-scd212 kernel: [658277.985599] ABORT_TASK: Found referenced iSCSI task_tag: 2438871 May 18 12:33:59 roc-4r-scd212 kernel: [658277.985606] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 2438871 May 18 12:33:59 roc-4r-scd212 kernel: [658277.985610] ABORT_TASK: Found referenced iSCSI task_tag: 2438873 May 18 12:33:59 roc-4r-scd212 kernel: [658277.985612] ABORT_TASK: ref_tag: 2438873 already complete, skipping May 18 12:33:59 roc-4r-scd212 kernel: [658277.985613] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 2438873 May 18 12:33:59 roc-4r-scd212 kernel: [658277.985620] Unable to locate ITT: 0x002536d7 on CID: 0 May 18 12:33:59 roc-4r-scd212 kernel: [658277.985632] Unable to locate RefTaskTag: 0x002536d7 on CID: 0. May 18 12:34:03 roc-4r-scd212 kernel: [658281.641021] Unexpected ret: -32 send data 48 May 18 12:34:16 roc-4r-scd212 kernel: [658295.276146] Unexpected ret: -32 send data 48 May 18 12:34:44 roc-4r-scd212 kernel: [658322.548137] Unexpected ret: -32 send data 48 May 18 12:34:57 roc-4r-scd212 kernel: [658336.279539] ABORT_TASK: Found referenced iSCSI task_tag: 1308622848 May 18 12:34:57 roc-4r-scd212 kernel: [658336.279544] ABORT_TASK: ref_tag: 1308622848 already complete, skipping May 18 12:34:57 roc-4r-scd212 kernel: [658336.279546] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1308622848 May 18 12:34:58 roc-4r-scd212 kernel: [658336.944859] ABORT_TASK: Found referenced iSCSI task_tag: 1862270976 May 18 12:34:58 roc-4r-scd212 kernel: [658336.944862] ABORT_TASK: ref_tag: 1862270976 already complete, skipping May 18 12:34:58 roc-4r-scd212 kernel: [658336.944863] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1862270976 May 18 12:35:01 roc-4r-scd212 CRON[963082]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) May 18 12:35:11 roc-4r-scd212 kernel: [658349.906624] Unexpected ret: -32 send data 48 May 18 12:35:25 roc-4r-scd212 kernel: [658363.528498] Unexpected ret: -32 send data 48 May 18 12:35:38 roc-4r-scd212 kernel: [658376.947982] ABORT_TASK: Found referenced iSCSI task_tag: 1732247552 May 18 12:35:38 roc-4r-scd212 kernel: [658376.947990] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1732247552 May 18 12:35:38 roc-4r-scd212 kernel: [658377.155014] Unexpected ret: -32 send data 48 May 18 12:35:48 roc-4r-scd212 kernel: [658386.840043] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000016 May 18 12:35:52 roc-4r-scd212 kernel: [658390.898777] ABORT_TASK: Found referenced iSCSI task_tag: 603979776 May 18 12:35:52 roc-4r-scd212 kernel: [658390.898780] Unexpected ret: -32 send data 48 May 18 12:35:52 roc-4r-scd212 kernel: [658390.898807] ABORT_TASK: ref_tag: 603979776 already complete, skipping May 18 12:35:52 roc-4r-scd212 kernel: [658390.898808] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 603979776 May 18 12:36:01 roc-4r-scd212 kernel: [658400.446803] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000020 May 18 12:36:01 roc-4r-scd212 kernel: [658400.446875] ABORT_TASK: Found referenced iSCSI task_tag: 2438925 May 18 12:36:01 roc-4r-scd212 kernel: [658400.446878] ABORT_TASK: ref_tag: 2438925 already complete, skipping May 18 12:36:01 roc-4r-scd212 kernel: [658400.446880] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 2438925 May 18 12:36:06 roc-4r-scd212 kernel: [658404.612435] Unexpected ret: -32 send data 48 May 18 12:36:15 roc-4r-scd212 kernel: [658414.060103] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000002a May 18 12:36:15 roc-4r-scd212 kernel: [658414.060175] ABORT_TASK: Found referenced iSCSI task_tag: 2438937 May 18 12:36:15 roc-4r-scd212 kernel: [658414.060178] ABORT_TASK: ref_tag: 2438937 already complete, skipping May 18 12:36:15 roc-4r-scd212 kernel: [658414.060180] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 2438937 May 18 12:36:19 roc-4r-scd212 kernel: [658418.384607] ABORT_TASK: Found referenced iSCSI task_tag: 1627389952 May 18 12:36:19 roc-4r-scd212 kernel: [658418.384612] Unexpected ret: -32 send data 48 May 18 12:36:19 roc-4r-scd212 kernel: [658418.384640] ABORT_TASK: ref_tag: 1627389952 already complete, skipping May 18 12:36:19 roc-4r-scd212 kernel: [658418.384641] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1627389952 May 18 12:36:22 roc-4r-scd212 kernel: [658420.844106] ABORT_TASK: Found referenced iSCSI task_tag: 943718400 May 18 12:36:22 roc-4r-scd212 kernel: [658420.844112] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 943718400 May 18 12:36:29 roc-4r-scd212 kernel: [658427.669519] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000034 May 18 12:36:33 roc-4r-scd212 kernel: [658432.016990] Unexpected ret: -32 send data 48 May 18 12:36:39 roc-4r-scd212 kernel: [658438.347261] ABORT_TASK: Found referenced iSCSI task_tag: 2134900736 May 18 12:36:39 roc-4r-scd212 kernel: [658438.347268] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 2134900736 ESXi (please note the time is in UTC here): 2015-05-18T18:50:15.590Z cpu38:1705984)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405a37e54d400000000000000000" state in doubt; requested fast p 2015-05-18T18:50:19.254Z cpu38:1705984)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405a37e54d400000000000000000" state in doubt; requested fast p 2015-05-18T18:52:26.959Z cpu38:1705984)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405a37e54d400000000000000000" state in doubt; requested fast p 2015-05-18T18:53:22.275Z cpu38:1705984)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405a37e54d400000000000000000" state in doubt; requested fast p 2015-05-18T18:53:25.943Z cpu38:1705984)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405a37e54d400000000000000000" state in doubt; requested fast p 2015-05-18T18:55:31.015Z cpu38:1705984)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405a37e54d400000000000000000" state in doubt; requested fast p 2015-05-18T18:55:34.614Z cpu38:1705984)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405a37e54d400000000000000000" state in doubt; requested fast p 2015-05-18T19:09:29.311Z cpu16:33166)WARNING: ScsiDeviceIO: 1223: Device naa.6001405a37e54d400000000000000000 performance has deteriorated. I/O latency increased from averag 2015-05-18T19:09:30.360Z cpu14:32843)WARNING: ScsiDeviceIO: 1223: Device naa.6001405a37e54d400000000000000000 performance has deteriorated. I/O latency increased from averag 2015-05-18T21:26:07.605Z cpu38:1705984)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast p 2015-05-18T21:26:27.904Z cpu9:33716)WARNING: iscsi_vmk: iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task" with itt=0x10d4450 (refITT=0x10d444f) timed ou 2015-05-18T21:26:27.904Z cpu27:1706465)WARNING: ScsiPath: 6292: Set retry timeout for failed TaskMgmt abort for CmdSN 0x0, status Failure, path vmhba33:C3:T22:L1 2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:3 T:22 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4) 2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000004 TARGET: iqn.2015-03.com.iss-integration:rbd-roc:tgt1 TPGT: 1 TSIH: 2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 10.70.2.180:39224 R: 10.70.2.211:3260] 2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk: iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task" with itt=0x10d445a (refITT=0x10d444e) timed ou 2015-05-18T21:26:33.616Z cpu38:1705984)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast p 2015-05-18T21:26:33.616Z cpu25:33722)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405a37e54d400000000000000000" state in doubt; requested fast pat 2015-05-18T21:26:33.904Z cpu5:32889)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast path 2015-05-18T21:26:34.917Z cpu5:1762870)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast pa 2015-05-18T21:26:35.478Z cpu24:32853)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405a37e54d400000000000000000" state in doubt; requested fast pat 2015-05-18T21:26:35.912Z cpu5:1762870)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast pa 2015-05-18T21:26:36.643Z cpu9:33716)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba33:CH:3 T:22 CN:0: iSCSI connection is being marked "ONLINE" 2015-05-18T21:26:36.643Z cpu9:33716)WARNING: iscsi_vmk: iscsivmk_StartConnection: Sess [ISID: 00023d000004 TARGET: iqn.2015-03.com.iss-integration:rbd-roc:tgt1 TPGT: 1 TSIH: 2015-05-18T21:26:36.643Z cpu9:33716)WARNING: iscsi_vmk: iscsivmk_StartConnection: Conn [CID: 0 L: 10.70.2.180:12484 R: 10.70.2.211:3260] 2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:3 T:22 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4) 2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000004 TARGET: iqn.2015-03.com.iss-integration:rbd-roc:tgt1 TPGT: 1 TSIH: 2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 10.70.2.180:12484 R: 10.70.2.211:3260] 2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk: iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task" with itt=0x10d445f (refITT=0x10d445e) timed ou 2015-05-18T21:26:47.222Z cpu26:33722)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP > >> >> On Wed, May 6, 2015 at 11:17 AM, Robert Wood <rwood@xxxxxxxxxxxxxxxxxxx> wrote: >> > Good morning, we are continuing to receive: >> > >> > May 6 11:08:26 roc-4r-scd212 kernel: [71898.566185] ABORT_TASK: >> > ref_tag: 3847728 already complete, skipping >> > May 6 11:08:26 roc-4r-scd212 kernel: [71898.566187] ABORT_TASK: >> > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 3847728 >> > May 6 11:08:26 roc-4r-scd212 kernel: [71898.566191] Unable to locate >> > ITT: 0x003ab632 on CID: 0 >> > May 6 11:08:26 roc-4r-scd212 kernel: [71898.566191] Unable to locate >> > RefTaskTag: 0x003ab632 on CID: 0. >> > May 6 11:08:26 roc-4r-scd212 kernel: [71898.566254] Unexpected ret: >> > -32 send data 48 >> > May 6 11:08:30 roc-4r-scd212 kernel: [71902.397053] libceph: osd20 >> > 10.80.3.25:6812 socket closed (con state OPEN) >> -- > > Thanks for reporting! > > --nab > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html