Hello, no iSCSI + VMware works without such problems. > We are on latest Nautilus, 12 x 10 TB OSDs (4 servers), 25 Gbit/s Ethernet, erasure coded rbd pool with 128 PGs, aroun 200 PGs per OSD total. Nautilus is a good choice 12*10TB HDD is not good for VMs 25Gbit/s on HDD is way to much for that system 200 PGs per OSD is to much, I would suggest 75-100 PGs per OSD You can improve latency on HDD clusters using external DB/WAL on NVMe. That might help you -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges@xxxxxxxx Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am So., 4. Okt. 2020 um 14:37 Uhr schrieb Golasowski Martin < martin.golasowski@xxxxxx>: > Hi, > does anyone here use CEPH iSCSI with VMware ESXi? It seems that we are > hitting the 5 second timeout limit on software HBA in ESXi. It appears > whenever there is increased load on the cluster, like deep scrub or > rebalance. Is it normal behaviour in production? Or is there something > special we need to tune? > > We are on latest Nautilus, 12 x 10 TB OSDs (4 servers), 25 Gbit/s > Ethernet, erasure coded rbd pool with 128 PGs, aroun 200 PGs per OSD total. > > > ESXi Log: > > 2020-10-04T01:57:04.314Z cpu34:2098959)WARNING: iscsi_vmk: > iscsivmk_ConnReceiveAtomic:517: vmhba64:CH:1 T:0 CN:0: Failed to receive > data: Connection closed by peer > 2020-10-04T01:57:04.314Z cpu34:2098959)iscsi_vmk: > iscsivmk_ConnRxNotifyFailure:1235: vmhba64:CH:1 T:0 CN:0: Connection rx > notifying failure: Failed to Receive. State=Bound > 2020-10-04T01:57:04.566Z cpu19:2098979)WARNING: iscsi_vmk: > iscsivmk_StopConnection:741: vmhba64:CH:1 T:0 CN:0: iSCSI connection is > being marked "OFFLINE" (Event:4) > 2020-10-04T01:57:04.654Z cpu7:2097866)WARNING: VMW_SATP_ALUA: > satp_alua_issueCommandOnPath:788: Probe cmd 0xa3 failed for path > "vmhba64:C2:T0:L0" (0x5/0x20/0x0). Check if failover mode is still ALUA. > > > OSD Log: > > [303088.450088] Did not receive response to NOPIN on CID: 0, failing > connection for I_T Nexus > iqn.1994-05.com.redhat:esxi1,i,0x00023d000002,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,0x01 > [324926.694077] Did not receive response to NOPIN on CID: 0, failing > connection for I_T Nexus > iqn.1994-05.com.redhat:esxi2,i,0x00023d000001,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,0x01 > [407067.404538] ABORT_TASK: Found referenced iSCSI task_tag: 5891 > [407076.077175] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 5891 > [411677.887690] ABORT_TASK: Found referenced iSCSI task_tag: 6722 > [411683.297425] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 6722 > [481459.755876] ABORT_TASK: Found referenced iSCSI task_tag: 7930 > [481460.787968] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 7930 > > Cheers, > Martin_______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx