Thanks!
Does that mean that occasional iSCSI path drop-outs are somewhat expected? We are using SSDs for WAL/DB on each OSD server, so at least that.
Do you think that If we buy additional 6/12 HDDs would that help with the IOPS for the VMs?
Regards, Martin
Hello,
no iSCSI + VMware works without such problems.
> We are on latest Nautilus, 12 x 10 TB OSDs (4 servers), 25 Gbit/s Ethernet, erasure coded rbd pool with 128 PGs, aroun 200 PGs per OSD total.
Nautilus is a good choice 12*10TB HDD is not good for VMs 25Gbit/s on HDD is way to much for that system 200 PGs per OSD is to much, I would suggest 75-100 PGs per OSD
You can improve latency on HDD clusters using external DB/WAL on NVMe. That might help you Hi,
does anyone here use CEPH iSCSI with VMware ESXi? It seems that we are hitting the 5 second timeout limit on software HBA in ESXi. It appears whenever there is increased load on the cluster, like deep scrub or rebalance. Is it normal behaviour in production? Or is there something special we need to tune?
We are on latest Nautilus, 12 x 10 TB OSDs (4 servers), 25 Gbit/s Ethernet, erasure coded rbd pool with 128 PGs, aroun 200 PGs per OSD total.
ESXi Log:
2020-10-04T01:57:04.314Z cpu34:2098959)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic:517: vmhba64:CH:1 T:0 CN:0: Failed to receive data: Connection closed by peer
2020-10-04T01:57:04.314Z cpu34:2098959)iscsi_vmk: iscsivmk_ConnRxNotifyFailure:1235: vmhba64:CH:1 T:0 CN:0: Connection rx notifying failure: Failed to Receive. State=Bound
2020-10-04T01:57:04.566Z cpu19:2098979)WARNING: iscsi_vmk: iscsivmk_StopConnection:741: vmhba64:CH:1 T:0 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2020-10-04T01:57:04.654Z cpu7:2097866)WARNING: VMW_SATP_ALUA: satp_alua_issueCommandOnPath:788: Probe cmd 0xa3 failed for path "vmhba64:C2:T0:L0" (0x5/0x20/0x0). Check if failover mode is still ALUA.
OSD Log:
[303088.450088] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1994-05.com.redhat:esxi1,i,0x00023d000002,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,0x01
[324926.694077] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1994-05.com.redhat:esxi2,i,0x00023d000001,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,0x01
[407067.404538] ABORT_TASK: Found referenced iSCSI task_tag: 5891
[407076.077175] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 5891
[411677.887690] ABORT_TASK: Found referenced iSCSI task_tag: 6722
[411683.297425] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 6722
[481459.755876] ABORT_TASK: Found referenced iSCSI task_tag: 7930
[481460.787968] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 7930
Cheers,
Martin_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
|