ESXi + LIO + Ceph RBD problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I'm testing LIO iSCSI target on top of Ceph RBD as an iSCSI datastore for VMware vSphere. When one
of the Ceph OSD nodes is terminated during heavy I/O (Storage VMotion to RBD), both initiator and
target side report ABORT_TASK-related errors and all I/O is stopped. It's necessary to drop iSCSI
connections and let ESXi reconnect to continue.

ESXi warnings:

WARNING: iscsi_vmk: iscsivmk_TaskMgmtIssue: vmhba35:CH:0 T:11 L:1 : Task mgmt "Abort Task" with
itt=0x944d1 (refITT=0x944cd) timed out.
VMW_SATP_ALUA: satp_alua_issueCommandOnPath:651: Path "vmhba35:C0:T11:L1" (UP) command 0xa3 failed
with status Timeout. H:0x3 D:0x0 P:0x0  Possible sense data: 0x0 0x0 0x0.
WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device
"naa.60014057056b5748fdbb7c16c3a0bd46" state in doubt; requested fast path state update...
WARNING: iscsi_vmk: iscsivmk_TaskMgmtAbortCommands: vmhba35:CH:0 T:11 L:1 : Abort task response
indicates task with itt=0x944c7 has been completed on the target but the task response has not arrived
... and similar ones

LIO warnings:

[ 3052.065353] ABORT_TASK: Found referenced iSCSI task_tag: 801219
[ 3052.066370] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 801219
[ 3082.714529] ABORT_TASK: Found referenced iSCSI task_tag: 801223
[ 3082.714532] ABORT_TASK: ref_tag: 801223 already complete, skipping
[ 3082.714533] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 801223
[ 3082.714536] ABORT_TASK: Found referenced iSCSI task_tag: 801222
[ 3082.714540] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 801222

I guess the errors are related to the hardcoded 5000ms iSCSI timeout in ESXi, where RBD driver
requires longer time to recover when one of the OSDs is lost. Is it possible? Does anybody have
similar experience with ESXi + LIO iSCSI + Ceph? I tried to tweak few Ceph heartbeat options but I'm
still at the beginning of the learning curve...

My Ceph setup is very basic now: 3 virtual machines with Debian Jessie and Ceph 0.80.7, one OSD and
MON on each VM. The iSCSI LUN is published from one of the nodes via dedicated network adapter to
the underlying vSphere infrastructure.

Thank you.

Martin

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux