Re: Update on crash with kernel 3.19

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Also starting to see these messages now:

May 20 13:45:51 roc-4r-scd212 kernel: [65824.744667] enhanceio:
Allocate 30398KB (8B per) mem for 3890944-entry cache
(capacity:15258MB, associativity:256, block size:4096
bytes)enhanceio_lru: Initialized 15199 sets in LRU
May 20 13:45:51 roc-4r-scd212 kernel: [65826.767511] Detected
MISCOMPARE for addr: ffff88084667e000 buf: ffff88105b7e7600
May 20 13:45:51 roc-4r-scd212 kernel: [65826.767516] Target/iblock:
Send MISCOMPARE check condition and sense
May 20 13:46:48 roc-4r-scd212 kernel: [65884.387119] libceph: mon2
10.80.3.236:6789 socket closed (con state OPEN)
May 20 13:46:48 roc-4r-scd212 kernel: [65884.387138] libceph: mon2
10.80.3.236:6789 session lost, hunting for new mon

On Tue, May 19, 2015 at 8:31 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
> Hello,
>
> On Fri, May 8, 2015 at 1:45 AM, Nicholas A. Bellinger
> <nab@xxxxxxxxxxxxxxx> wrote:
>> Hi Robert,
>>
>> On Wed, 2015-05-06 at 12:21 -0400, Robert Wood wrote:
>>> One update: it appears that Vmware ESXi 5.5 U2 is sometimes
>>> incorrectly sensing that the LIO-ORG device is an SSD.  I wonder if
>>> that is causing issues with commands being sent?  I am testing
>>> untagging all LIO-ORG devices as SSD to see if the problem recurs.
>>>
>>>
>>
>> Changing around SCSI target feature bits reported by LIO to ESX hosts is
>> only going to mask the real issue.
>>
>> From Mike's testing earlier in the week, he's been able to verify with
>> FC and iSCSI that it's not a target specific bug, but AFAICT it's still
>> not clear if something specific to iscsi-target may be triggering a bug
>> in ceph client.
>>
>> AFAICT, the bug is centered around ceph client reconnect logic +
>> read-copy-update (RCU) usage in v3.19.y code.  (Adding Sage CC')
>
> We continue to get crashes about once weekly, at this point ESXi is
> all set to defaults and there are no ceph warnings of any kind, nor
> any others - we get this in syslog and vmkwarning.log in ESXi:
>
>                    syslog:
>
>
> May 18 12:33:47 roc-4r-scd212 kernel: [658265.920803]
> TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending
> CHECK_CONDITION.
> May 18 12:33:47 roc-4r-scd212 kernel: [658265.921082]
> TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending
> CHECK_CONDITION.
> May 18 12:33:47 roc-4r-scd212 kernel: [658265.979386]
> TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending
> CHECK_CONDITION.
> May 18 12:33:47 roc-4r-scd212 kernel: [658265.979723]
> TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending
> CHECK_CONDITION.
> May 18 12:33:49 roc-4r-scd212 kernel: [658267.940356] ABORT_TASK:
> Found referenced iSCSI task_tag: 205520896
> May 18 12:33:49 roc-4r-scd212 kernel: [658267.940359] ABORT_TASK:
> ref_tag: 205520896 already complete, skipping
> May 18 12:33:49 roc-4r-scd212 kernel: [658267.940361] ABORT_TASK:
> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 205520896
> May 18 12:33:49 roc-4r-scd212 kernel: [658267.940371] Unexpected ret:
> -32 send data 48
> May 18 12:33:50 roc-4r-scd212 kernel: [658268.637907] ABORT_TASK:
> Found referenced iSCSI task_tag: 809500672
> May 18 12:33:50 roc-4r-scd212 kernel: [658268.637913] ABORT_TASK:
> Sending TMR_FUNCTION_COMPLETE for ref_tag: 809500672
> May 18 12:33:59 roc-4r-scd212 kernel: [658277.985599] ABORT_TASK:
> Found referenced iSCSI task_tag: 2438871
> May 18 12:33:59 roc-4r-scd212 kernel: [658277.985606] ABORT_TASK:
> Sending TMR_FUNCTION_COMPLETE for ref_tag: 2438871
> May 18 12:33:59 roc-4r-scd212 kernel: [658277.985610] ABORT_TASK:
> Found referenced iSCSI task_tag: 2438873
> May 18 12:33:59 roc-4r-scd212 kernel: [658277.985612] ABORT_TASK:
> ref_tag: 2438873 already complete, skipping
> May 18 12:33:59 roc-4r-scd212 kernel: [658277.985613] ABORT_TASK:
> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 2438873
> May 18 12:33:59 roc-4r-scd212 kernel: [658277.985620] Unable to locate
> ITT: 0x002536d7 on CID: 0
> May 18 12:33:59 roc-4r-scd212 kernel: [658277.985632] Unable to locate
> RefTaskTag: 0x002536d7 on CID: 0.
> May 18 12:34:03 roc-4r-scd212 kernel: [658281.641021] Unexpected ret:
> -32 send data 48
> May 18 12:34:16 roc-4r-scd212 kernel: [658295.276146] Unexpected ret:
> -32 send data 48
> May 18 12:34:44 roc-4r-scd212 kernel: [658322.548137] Unexpected ret:
> -32 send data 48
> May 18 12:34:57 roc-4r-scd212 kernel: [658336.279539] ABORT_TASK:
> Found referenced iSCSI task_tag: 1308622848
> May 18 12:34:57 roc-4r-scd212 kernel: [658336.279544] ABORT_TASK:
> ref_tag: 1308622848 already complete, skipping
> May 18 12:34:57 roc-4r-scd212 kernel: [658336.279546] ABORT_TASK:
> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1308622848
> May 18 12:34:58 roc-4r-scd212 kernel: [658336.944859] ABORT_TASK:
> Found referenced iSCSI task_tag: 1862270976
> May 18 12:34:58 roc-4r-scd212 kernel: [658336.944862] ABORT_TASK:
> ref_tag: 1862270976 already complete, skipping
> May 18 12:34:58 roc-4r-scd212 kernel: [658336.944863] ABORT_TASK:
> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1862270976
> May 18 12:35:01 roc-4r-scd212 CRON[963082]: (root) CMD (command -v
> debian-sa1 > /dev/null && debian-sa1 1 1)
> May 18 12:35:11 roc-4r-scd212 kernel: [658349.906624] Unexpected ret:
> -32 send data 48
> May 18 12:35:25 roc-4r-scd212 kernel: [658363.528498] Unexpected ret:
> -32 send data 48
> May 18 12:35:38 roc-4r-scd212 kernel: [658376.947982] ABORT_TASK:
> Found referenced iSCSI task_tag: 1732247552
> May 18 12:35:38 roc-4r-scd212 kernel: [658376.947990] ABORT_TASK:
> Sending TMR_FUNCTION_COMPLETE for ref_tag: 1732247552
> May 18 12:35:38 roc-4r-scd212 kernel: [658377.155014] Unexpected ret:
> -32 send data 48
> May 18 12:35:48 roc-4r-scd212 kernel: [658386.840043]
> TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000016
> May 18 12:35:52 roc-4r-scd212 kernel: [658390.898777] ABORT_TASK:
> Found referenced iSCSI task_tag: 603979776
> May 18 12:35:52 roc-4r-scd212 kernel: [658390.898780] Unexpected ret:
> -32 send data 48
> May 18 12:35:52 roc-4r-scd212 kernel: [658390.898807] ABORT_TASK:
> ref_tag: 603979776 already complete, skipping
> May 18 12:35:52 roc-4r-scd212 kernel: [658390.898808] ABORT_TASK:
> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 603979776
> May 18 12:36:01 roc-4r-scd212 kernel: [658400.446803]
> TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000020
> May 18 12:36:01 roc-4r-scd212 kernel: [658400.446875] ABORT_TASK:
> Found referenced iSCSI task_tag: 2438925
> May 18 12:36:01 roc-4r-scd212 kernel: [658400.446878] ABORT_TASK:
> ref_tag: 2438925 already complete, skipping
> May 18 12:36:01 roc-4r-scd212 kernel: [658400.446880] ABORT_TASK:
> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 2438925
> May 18 12:36:06 roc-4r-scd212 kernel: [658404.612435] Unexpected ret:
> -32 send data 48
> May 18 12:36:15 roc-4r-scd212 kernel: [658414.060103]
> TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000002a
> May 18 12:36:15 roc-4r-scd212 kernel: [658414.060175] ABORT_TASK:
> Found referenced iSCSI task_tag: 2438937
> May 18 12:36:15 roc-4r-scd212 kernel: [658414.060178] ABORT_TASK:
> ref_tag: 2438937 already complete, skipping
> May 18 12:36:15 roc-4r-scd212 kernel: [658414.060180] ABORT_TASK:
> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 2438937
> May 18 12:36:19 roc-4r-scd212 kernel: [658418.384607] ABORT_TASK:
> Found referenced iSCSI task_tag: 1627389952
> May 18 12:36:19 roc-4r-scd212 kernel: [658418.384612] Unexpected ret:
> -32 send data 48
> May 18 12:36:19 roc-4r-scd212 kernel: [658418.384640] ABORT_TASK:
> ref_tag: 1627389952 already complete, skipping
> May 18 12:36:19 roc-4r-scd212 kernel: [658418.384641] ABORT_TASK:
> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1627389952
> May 18 12:36:22 roc-4r-scd212 kernel: [658420.844106] ABORT_TASK:
> Found referenced iSCSI task_tag: 943718400
> May 18 12:36:22 roc-4r-scd212 kernel: [658420.844112] ABORT_TASK:
> Sending TMR_FUNCTION_COMPLETE for ref_tag: 943718400
> May 18 12:36:29 roc-4r-scd212 kernel: [658427.669519]
> TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000034
> May 18 12:36:33 roc-4r-scd212 kernel: [658432.016990] Unexpected ret:
> -32 send data 48
> May 18 12:36:39 roc-4r-scd212 kernel: [658438.347261] ABORT_TASK:
> Found referenced iSCSI task_tag: 2134900736
> May 18 12:36:39 roc-4r-scd212 kernel: [658438.347268] ABORT_TASK:
> Sending TMR_FUNCTION_COMPLETE for ref_tag: 2134900736
>
>
>
>                    ESXi (please note the time is in UTC here):
>
> 2015-05-18T18:50:15.590Z cpu38:1705984)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405a37e54d400000000000000000" state in doubt; requested fast
> p
> 2015-05-18T18:50:19.254Z cpu38:1705984)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405a37e54d400000000000000000" state in doubt; requested fast
> p
> 2015-05-18T18:52:26.959Z cpu38:1705984)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405a37e54d400000000000000000" state in doubt; requested fast
> p
> 2015-05-18T18:53:22.275Z cpu38:1705984)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405a37e54d400000000000000000" state in doubt; requested fast
> p
> 2015-05-18T18:53:25.943Z cpu38:1705984)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405a37e54d400000000000000000" state in doubt; requested fast
> p
> 2015-05-18T18:55:31.015Z cpu38:1705984)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405a37e54d400000000000000000" state in doubt; requested fast
> p
> 2015-05-18T18:55:34.614Z cpu38:1705984)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405a37e54d400000000000000000" state in doubt; requested fast
> p
> 2015-05-18T19:09:29.311Z cpu16:33166)WARNING: ScsiDeviceIO: 1223:
> Device naa.6001405a37e54d400000000000000000 performance has
> deteriorated. I/O latency increased from averag
> 2015-05-18T19:09:30.360Z cpu14:32843)WARNING: ScsiDeviceIO: 1223:
> Device naa.6001405a37e54d400000000000000000 performance has
> deteriorated. I/O latency increased from averag
> 2015-05-18T21:26:07.605Z cpu38:1705984)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast
> p
> 2015-05-18T21:26:27.904Z cpu9:33716)WARNING: iscsi_vmk:
> iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task"
> with itt=0x10d4450 (refITT=0x10d444f) timed ou
> 2015-05-18T21:26:27.904Z cpu27:1706465)WARNING: ScsiPath: 6292: Set
> retry timeout for failed TaskMgmt abort for CmdSN  0x0, status
> Failure, path vmhba33:C3:T22:L1
> 2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk:
> iscsivmk_StopConnection: vmhba33:CH:3 T:22 CN:0: iSCSI connection is
> being marked "OFFLINE" (Event:4)
> 2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk:
> iscsivmk_StopConnection: Sess [ISID: 00023d000004 TARGET:
> iqn.2015-03.com.iss-integration:rbd-roc:tgt1 TPGT: 1 TSIH:
> 2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk:
> iscsivmk_StopConnection: Conn [CID: 0 L: 10.70.2.180:39224 R:
> 10.70.2.211:3260]
> 2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk:
> iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task"
> with itt=0x10d445a (refITT=0x10d444e) timed ou
> 2015-05-18T21:26:33.616Z cpu38:1705984)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast
> p
> 2015-05-18T21:26:33.616Z cpu25:33722)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405a37e54d400000000000000000" state in doubt; requested fast
> pat
> 2015-05-18T21:26:33.904Z cpu5:32889)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast
> path
> 2015-05-18T21:26:34.917Z cpu5:1762870)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast
> pa
> 2015-05-18T21:26:35.478Z cpu24:32853)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405a37e54d400000000000000000" state in doubt; requested fast
> pat
> 2015-05-18T21:26:35.912Z cpu5:1762870)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.6001405ca6b51d000000000000000000" state in doubt; requested fast
> pa
> 2015-05-18T21:26:36.643Z cpu9:33716)WARNING: iscsi_vmk:
> iscsivmk_StartConnection: vmhba33:CH:3 T:22 CN:0: iSCSI connection is
> being marked "ONLINE"
> 2015-05-18T21:26:36.643Z cpu9:33716)WARNING: iscsi_vmk:
> iscsivmk_StartConnection: Sess [ISID: 00023d000004 TARGET:
> iqn.2015-03.com.iss-integration:rbd-roc:tgt1 TPGT: 1 TSIH:
> 2015-05-18T21:26:36.643Z cpu9:33716)WARNING: iscsi_vmk:
> iscsivmk_StartConnection: Conn [CID: 0 L: 10.70.2.180:12484 R:
> 10.70.2.211:3260]
> 2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk:
> iscsivmk_StopConnection: vmhba33:CH:3 T:22 CN:0: iSCSI connection is
> being marked "OFFLINE" (Event:4)
> 2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk:
> iscsivmk_StopConnection: Sess [ISID: 00023d000004 TARGET:
> iqn.2015-03.com.iss-integration:rbd-roc:tgt1 TPGT: 1 TSIH:
> 2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk:
> iscsivmk_StopConnection: Conn [CID: 0 L: 10.70.2.180:12484 R:
> 10.70.2.211:3260]
> 2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk:
> iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task"
> with itt=0x10d445f (refITT=0x10d445e) timed ou
> 2015-05-18T21:26:47.222Z cpu26:33722)WARNING: NMP:
> nmp_DeviceRequestFastDeviceProbe:237: NMP
>
>
>>
>>>
>>> On Wed, May 6, 2015 at 11:17 AM, Robert Wood <rwood@xxxxxxxxxxxxxxxxxxx> wrote:
>>> > Good morning, we are continuing to receive:
>>> >
>>> > May  6 11:08:26 roc-4r-scd212 kernel: [71898.566185] ABORT_TASK:
>>> > ref_tag: 3847728 already complete, skipping
>>> > May  6 11:08:26 roc-4r-scd212 kernel: [71898.566187] ABORT_TASK:
>>> > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 3847728
>>> > May  6 11:08:26 roc-4r-scd212 kernel: [71898.566191] Unable to locate
>>> > ITT: 0x003ab632 on CID: 0
>>> > May  6 11:08:26 roc-4r-scd212 kernel: [71898.566191] Unable to locate
>>> > RefTaskTag: 0x003ab632 on CID: 0.
>>> > May  6 11:08:26 roc-4r-scd212 kernel: [71898.566254] Unexpected ret:
>>> > -32 send data 48
>>> > May  6 11:08:30 roc-4r-scd212 kernel: [71902.397053] libceph: osd20
>>> > 10.80.3.25:6812 socket closed (con state OPEN)
>>> --
>>
>> Thanks for reporting!
>>
>> --nab
>>
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux