Re: ESXi + LIO + Ceph RBD problem

Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> · Sun, 16 Aug 2015 22:01:13 -0400

Hi Martin,

We tested and ran into similar scenarios.  Based on your description,
you are running RBD client on an OSD node, which is not recommended.
As is not recommended to run OSD and MON on the same nodes.

The ABORT_TASK errors have always been related to Ceph timeouts.
Basically, my understanding is that an rdb times out, and after a
brief grace period LIO and ESXi enter into a hailstorm of retry/reset
commands, at which point IO pretty much won't resume.  LIO stays
strict to the SCSI spec where one data session is to be maintained,
whereas my understanding is other solutions like TGT and SCST allow
another session to start and bypass this issue.

We have good results with three OSD nodes, 3 MONs, 2 LIO nodes with
failover via Pacemaker and kernel 4.1.

Best regards,
Alex

On Fri, Aug 14, 2015 at 11:28 AM, Martin Svec <martin.svec@xxxxxxxx> wrote:
> Hello,
>
> I'm testing LIO iSCSI target on top of Ceph RBD as an iSCSI datastore for VMware vSphere. When one
> of the Ceph OSD nodes is terminated during heavy I/O (Storage VMotion to RBD), both initiator and
> target side report ABORT_TASK-related errors and all I/O is stopped. It's necessary to drop iSCSI
> connections and let ESXi reconnect to continue.
>
> ESXi warnings:
>
> WARNING: iscsi_vmk: iscsivmk_TaskMgmtIssue: vmhba35:CH:0 T:11 L:1 : Task mgmt "Abort Task" with
> itt=0x944d1 (refITT=0x944cd) timed out.
> VMW_SATP_ALUA: satp_alua_issueCommandOnPath:651: Path "vmhba35:C0:T11:L1" (UP) command 0xa3 failed
> with status Timeout. H:0x3 D:0x0 P:0x0  Possible sense data: 0x0 0x0 0x0.
> WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device
> "naa.60014057056b5748fdbb7c16c3a0bd46" state in doubt; requested fast path state update...
> WARNING: iscsi_vmk: iscsivmk_TaskMgmtAbortCommands: vmhba35:CH:0 T:11 L:1 : Abort task response
> indicates task with itt=0x944c7 has been completed on the target but the task response has not arrived
> ... and similar ones
>
> LIO warnings:
>
> [ 3052.065353] ABORT_TASK: Found referenced iSCSI task_tag: 801219
> [ 3052.066370] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 801219
> [ 3082.714529] ABORT_TASK: Found referenced iSCSI task_tag: 801223
> [ 3082.714532] ABORT_TASK: ref_tag: 801223 already complete, skipping
> [ 3082.714533] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 801223
> [ 3082.714536] ABORT_TASK: Found referenced iSCSI task_tag: 801222
> [ 3082.714540] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 801222
>
> I guess the errors are related to the hardcoded 5000ms iSCSI timeout in ESXi, where RBD driver
> requires longer time to recover when one of the OSDs is lost. Is it possible? Does anybody have
> similar experience with ESXi + LIO iSCSI + Ceph? I tried to tweak few Ceph heartbeat options but I'm
> still at the beginning of the learning curve...
>
> My Ceph setup is very basic now: 3 virtual machines with Debian Jessie and Ceph 0.80.7, one OSD and
> MON on each VM. The iSCSI LUN is published from one of the nodes via dedicated network adapter to
> the underlying vSphere infrastructure.
>
> Thank you.
>
> Martin
>
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html