Re: Update on crash with kernel 3.19

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Fri, May 8, 2015 at 1:45 AM, Nicholas A. Bellinger
<nab@xxxxxxxxxxxxxxx> wrote:
> Hi Robert,
>
> On Wed, 2015-05-06 at 12:21 -0400, Robert Wood wrote:
>> One update: it appears that Vmware ESXi 5.5 U2 is sometimes
>> incorrectly sensing that the LIO-ORG device is an SSD.  I wonder if
>> that is causing issues with commands being sent?  I am testing
>> untagging all LIO-ORG devices as SSD to see if the problem recurs.
>>
>>
>
> Changing around SCSI target feature bits reported by LIO to ESX hosts is
> only going to mask the real issue.
>
> From Mike's testing earlier in the week, he's been able to verify with
> FC and iSCSI that it's not a target specific bug, but AFAICT it's still
> not clear if something specific to iscsi-target may be triggering a bug
> in ceph client.
>
> AFAICT, the bug is centered around ceph client reconnect logic +
> read-copy-update (RCU) usage in v3.19.y code.  (Adding Sage CC')

We continue to get crashes about once weekly, at this point ESXi is
all set to defaults and there are no ceph warnings of any kind, nor
any others - we get this in syslog and vmkwarning.log in ESXi:

                   syslog:


May 18 12:33:47 roc-4r-scd212 kernel: [658265.920803]
TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending
CHECK_CONDITION.
May 18 12:33:47 roc-4r-scd212 kernel: [658265.921082]
TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending
CHECK_CONDITION.
May 18 12:33:47 roc-4r-scd212 kernel: [658265.979386]
TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending
CHECK_CONDITION.
May 18 12:33:47 roc-4r-scd212 kernel: [658265.979723]
TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x85, sending
CHECK_CONDITION.
May 18 12:33:49 roc-4r-scd212 kernel: [658267.940356] ABORT_TASK:
Found referenced iSCSI task_tag: 205520896
May 18 12:33:49 roc-4r-scd212 kernel: [658267.940359] ABORT_TASK:
ref_tag: 205520896 already complete, skipping
May 18 12:33:49 roc-4r-scd212 kernel: [658267.940361] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 205520896
May 18 12:33:49 roc-4r-scd212 kernel: [658267.940371] Unexpected ret:
-32 send data 48
May 18 12:33:50 roc-4r-scd212 kernel: [658268.637907] ABORT_TASK:
Found referenced iSCSI task_tag: 809500672
May 18 12:33:50 roc-4r-scd212 kernel: [658268.637913] ABORT_TASK:
Sending TMR_FUNCTION_COMPLETE for ref_tag: 809500672
May 18 12:33:59 roc-4r-scd212 kernel: [658277.985599] ABORT_TASK:
Found referenced iSCSI task_tag: 2438871
May 18 12:33:59 roc-4r-scd212 kernel: [658277.985606] ABORT_TASK:
Sending TMR_FUNCTION_COMPLETE for ref_tag: 2438871
May 18 12:33:59 roc-4r-scd212 kernel: [658277.985610] ABORT_TASK:
Found referenced iSCSI task_tag: 2438873
May 18 12:33:59 roc-4r-scd212 kernel: [658277.985612] ABORT_TASK:
ref_tag: 2438873 already complete, skipping
May 18 12:33:59 roc-4r-scd212 kernel: [658277.985613] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 2438873
May 18 12:33:59 roc-4r-scd212 kernel: [658277.985620] Unable to locate
ITT: 0x002536d7 on CID: 0
May 18 12:33:59 roc-4r-scd212 kernel: [658277.985632] Unable to locate
RefTaskTag: 0x002536d7 on CID: 0.
May 18 12:34:03 roc-4r-scd212 kernel: [658281.641021] Unexpected ret:
-32 send data 48
May 18 12:34:16 roc-4r-scd212 kernel: [658295.276146] Unexpected ret:
-32 send data 48
May 18 12:34:44 roc-4r-scd212 kernel: [658322.548137] Unexpected ret:
-32 send data 48
May 18 12:34:57 roc-4r-scd212 kernel: [658336.279539] ABORT_TASK:
Found referenced iSCSI task_tag: 1308622848
May 18 12:34:57 roc-4r-scd212 kernel: [658336.279544] ABORT_TASK:
ref_tag: 1308622848 already complete, skipping
May 18 12:34:57 roc-4r-scd212 kernel: [658336.279546] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1308622848
May 18 12:34:58 roc-4r-scd212 kernel: [658336.944859] ABORT_TASK:
Found referenced iSCSI task_tag: 1862270976
May 18 12:34:58 roc-4r-scd212 kernel: [658336.944862] ABORT_TASK:
ref_tag: 1862270976 already complete, skipping
May 18 12:34:58 roc-4r-scd212 kernel: [658336.944863] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1862270976
May 18 12:35:01 roc-4r-scd212 CRON[963082]: (root) CMD (command -v
debian-sa1 > /dev/null && debian-sa1 1 1)
May 18 12:35:11 roc-4r-scd212 kernel: [658349.906624] Unexpected ret:
-32 send data 48
May 18 12:35:25 roc-4r-scd212 kernel: [658363.528498] Unexpected ret:
-32 send data 48
May 18 12:35:38 roc-4r-scd212 kernel: [658376.947982] ABORT_TASK:
Found referenced iSCSI task_tag: 1732247552
May 18 12:35:38 roc-4r-scd212 kernel: [658376.947990] ABORT_TASK:
Sending TMR_FUNCTION_COMPLETE for ref_tag: 1732247552
May 18 12:35:38 roc-4r-scd212 kernel: [658377.155014] Unexpected ret:
-32 send data 48
May 18 12:35:48 roc-4r-scd212 kernel: [658386.840043]
TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000016
May 18 12:35:52 roc-4r-scd212 kernel: [658390.898777] ABORT_TASK:
Found referenced iSCSI task_tag: 603979776
May 18 12:35:52 roc-4r-scd212 kernel: [658390.898780] Unexpected ret:
-32 send data 48
May 18 12:35:52 roc-4r-scd212 kernel: [658390.898807] ABORT_TASK:
ref_tag: 603979776 already complete, skipping
May 18 12:35:52 roc-4r-scd212 kernel: [658390.898808] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 603979776
May 18 12:36:01 roc-4r-scd212 kernel: [658400.446803]
TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000020
May 18 12:36:01 roc-4r-scd212 kernel: [658400.446875] ABORT_TASK:
Found referenced iSCSI task_tag: 2438925
May 18 12:36:01 roc-4r-scd212 kernel: [658400.446878] ABORT_TASK:
ref_tag: 2438925 already complete, skipping
May 18 12:36:01 roc-4r-scd212 kernel: [658400.446880] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 2438925
May 18 12:36:06 roc-4r-scd212 kernel: [658404.612435] Unexpected ret:
-32 send data 48
May 18 12:36:15 roc-4r-scd212 kernel: [658414.060103]
TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000002a
May 18 12:36:15 roc-4r-scd212 kernel: [658414.060175] ABORT_TASK:
Found referenced iSCSI task_tag: 2438937
May 18 12:36:15 roc-4r-scd212 kernel: [658414.060178] ABORT_TASK:
ref_tag: 2438937 already complete, skipping
May 18 12:36:15 roc-4r-scd212 kernel: [658414.060180] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 2438937
May 18 12:36:19 roc-4r-scd212 kernel: [658418.384607] ABORT_TASK:
Found referenced iSCSI task_tag: 1627389952
May 18 12:36:19 roc-4r-scd212 kernel: [658418.384612] Unexpected ret:
-32 send data 48
May 18 12:36:19 roc-4r-scd212 kernel: [658418.384640] ABORT_TASK:
ref_tag: 1627389952 already complete, skipping
May 18 12:36:19 roc-4r-scd212 kernel: [658418.384641] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1627389952
May 18 12:36:22 roc-4r-scd212 kernel: [658420.844106] ABORT_TASK:
Found referenced iSCSI task_tag: 943718400
May 18 12:36:22 roc-4r-scd212 kernel: [658420.844112] ABORT_TASK:
Sending TMR_FUNCTION_COMPLETE for ref_tag: 943718400
May 18 12:36:29 roc-4r-scd212 kernel: [658427.669519]
TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000034
May 18 12:36:33 roc-4r-scd212 kernel: [658432.016990] Unexpected ret:
-32 send data 48
May 18 12:36:39 roc-4r-scd212 kernel: [658438.347261] ABORT_TASK:
Found referenced iSCSI task_tag: 2134900736
May 18 12:36:39 roc-4r-scd212 kernel: [658438.347268] ABORT_TASK:
Sending TMR_FUNCTION_COMPLETE for ref_tag: 2134900736



                   ESXi (please note the time is in UTC here):

2015-05-18T18:50:15.590Z cpu38:1705984)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405a37e54d400000000000000000" state in doubt; requested fast
p
2015-05-18T18:50:19.254Z cpu38:1705984)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405a37e54d400000000000000000" state in doubt; requested fast
p
2015-05-18T18:52:26.959Z cpu38:1705984)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405a37e54d400000000000000000" state in doubt; requested fast
p
2015-05-18T18:53:22.275Z cpu38:1705984)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405a37e54d400000000000000000" state in doubt; requested fast
p
2015-05-18T18:53:25.943Z cpu38:1705984)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405a37e54d400000000000000000" state in doubt; requested fast
p
2015-05-18T18:55:31.015Z cpu38:1705984)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405a37e54d400000000000000000" state in doubt; requested fast
p
2015-05-18T18:55:34.614Z cpu38:1705984)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405a37e54d400000000000000000" state in doubt; requested fast
p
2015-05-18T19:09:29.311Z cpu16:33166)WARNING: ScsiDeviceIO: 1223:
Device naa.6001405a37e54d400000000000000000 performance has
deteriorated. I/O latency increased from averag
2015-05-18T19:09:30.360Z cpu14:32843)WARNING: ScsiDeviceIO: 1223:
Device naa.6001405a37e54d400000000000000000 performance has
deteriorated. I/O latency increased from averag
2015-05-18T21:26:07.605Z cpu38:1705984)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405ca6b51d000000000000000000" state in doubt; requested fast
p
2015-05-18T21:26:27.904Z cpu9:33716)WARNING: iscsi_vmk:
iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task"
with itt=0x10d4450 (refITT=0x10d444f) timed ou
2015-05-18T21:26:27.904Z cpu27:1706465)WARNING: ScsiPath: 6292: Set
retry timeout for failed TaskMgmt abort for CmdSN  0x0, status
Failure, path vmhba33:C3:T22:L1
2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk:
iscsivmk_StopConnection: vmhba33:CH:3 T:22 CN:0: iSCSI connection is
being marked "OFFLINE" (Event:4)
2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk:
iscsivmk_StopConnection: Sess [ISID: 00023d000004 TARGET:
iqn.2015-03.com.iss-integration:rbd-roc:tgt1 TPGT: 1 TSIH:
2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk:
iscsivmk_StopConnection: Conn [CID: 0 L: 10.70.2.180:39224 R:
10.70.2.211:3260]
2015-05-18T21:26:33.616Z cpu9:33716)WARNING: iscsi_vmk:
iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task"
with itt=0x10d445a (refITT=0x10d444e) timed ou
2015-05-18T21:26:33.616Z cpu38:1705984)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405ca6b51d000000000000000000" state in doubt; requested fast
p
2015-05-18T21:26:33.616Z cpu25:33722)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405a37e54d400000000000000000" state in doubt; requested fast
pat
2015-05-18T21:26:33.904Z cpu5:32889)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405ca6b51d000000000000000000" state in doubt; requested fast
path
2015-05-18T21:26:34.917Z cpu5:1762870)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405ca6b51d000000000000000000" state in doubt; requested fast
pa
2015-05-18T21:26:35.478Z cpu24:32853)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405a37e54d400000000000000000" state in doubt; requested fast
pat
2015-05-18T21:26:35.912Z cpu5:1762870)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.6001405ca6b51d000000000000000000" state in doubt; requested fast
pa
2015-05-18T21:26:36.643Z cpu9:33716)WARNING: iscsi_vmk:
iscsivmk_StartConnection: vmhba33:CH:3 T:22 CN:0: iSCSI connection is
being marked "ONLINE"
2015-05-18T21:26:36.643Z cpu9:33716)WARNING: iscsi_vmk:
iscsivmk_StartConnection: Sess [ISID: 00023d000004 TARGET:
iqn.2015-03.com.iss-integration:rbd-roc:tgt1 TPGT: 1 TSIH:
2015-05-18T21:26:36.643Z cpu9:33716)WARNING: iscsi_vmk:
iscsivmk_StartConnection: Conn [CID: 0 L: 10.70.2.180:12484 R:
10.70.2.211:3260]
2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk:
iscsivmk_StopConnection: vmhba33:CH:3 T:22 CN:0: iSCSI connection is
being marked "OFFLINE" (Event:4)
2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk:
iscsivmk_StopConnection: Sess [ISID: 00023d000004 TARGET:
iqn.2015-03.com.iss-integration:rbd-roc:tgt1 TPGT: 1 TSIH:
2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk:
iscsivmk_StopConnection: Conn [CID: 0 L: 10.70.2.180:12484 R:
10.70.2.211:3260]
2015-05-18T21:26:47.222Z cpu9:33716)WARNING: iscsi_vmk:
iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task"
with itt=0x10d445f (refITT=0x10d445e) timed ou
2015-05-18T21:26:47.222Z cpu26:33722)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP


>
>>
>> On Wed, May 6, 2015 at 11:17 AM, Robert Wood <rwood@xxxxxxxxxxxxxxxxxxx> wrote:
>> > Good morning, we are continuing to receive:
>> >
>> > May  6 11:08:26 roc-4r-scd212 kernel: [71898.566185] ABORT_TASK:
>> > ref_tag: 3847728 already complete, skipping
>> > May  6 11:08:26 roc-4r-scd212 kernel: [71898.566187] ABORT_TASK:
>> > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 3847728
>> > May  6 11:08:26 roc-4r-scd212 kernel: [71898.566191] Unable to locate
>> > ITT: 0x003ab632 on CID: 0
>> > May  6 11:08:26 roc-4r-scd212 kernel: [71898.566191] Unable to locate
>> > RefTaskTag: 0x003ab632 on CID: 0.
>> > May  6 11:08:26 roc-4r-scd212 kernel: [71898.566254] Unexpected ret:
>> > -32 send data 48
>> > May  6 11:08:30 roc-4r-scd212 kernel: [71902.397053] libceph: osd20
>> > 10.80.3.25:6812 socket closed (con state OPEN)
>> --
>
> Thanks for reporting!
>
> --nab
>
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux