Hello, in my personal opinion, HDDs are a technology from the last century and I would never ever think about using such old technology for modern VM/Container/... workloads. My time, as well as any employee is too precious to wait for a harddrive to find the requested data! Use EC on NVMe if you need to save some money. It's still much faster with lower latency than HDDs. As each HDD only adds like 100 IO/s and 20-30 MB/s to your cluster, you can throw in 100 Disks and won't even come near the performance of a single SSD. Yes, each disk will improve your performance, but by such a small amount that it makes no sense in my eyes. > Does that mean that occasional iSCSI path drop-outs are somewhat expected? Not that I'm aware of, but I have no HDD based ISCSI cluster at hand to check. Sorry. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges@xxxxxxxx Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am So., 4. Okt. 2020 um 16:06 Uhr schrieb Golasowski Martin < martin.golasowski@xxxxxx>: > Thanks! > > Does that mean that occasional iSCSI path drop-outs are somewhat expected? > We are using SSDs for WAL/DB on each OSD server, so at least that. > > Do you think that If we buy additional 6/12 HDDs would that help with the > IOPS for the VMs? > > Regards, > Martin > > > > On 4 Oct 2020, at 15:17, Martin Verges <martin.verges@xxxxxxxx> wrote: > > Hello, > > no iSCSI + VMware works without such problems. > > > We are on latest Nautilus, 12 x 10 TB OSDs (4 servers), 25 Gbit/s > Ethernet, erasure coded rbd pool with 128 PGs, aroun 200 PGs per OSD total. > > Nautilus is a good choice > 12*10TB HDD is not good for VMs > 25Gbit/s on HDD is way to much for that system > 200 PGs per OSD is to much, I would suggest 75-100 PGs per OSD > > You can improve latency on HDD clusters using external DB/WAL on NVMe. > That might help you > > -- > Martin Verges > Managing director > > Mobile: +49 174 9335695 > E-Mail: martin.verges@xxxxxxxx > Chat: https://t.me/MartinVerges > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > > Web: https://croit.io > YouTube: https://goo.gl/PGE1Bx > > > Am So., 4. Okt. 2020 um 14:37 Uhr schrieb Golasowski Martin < > martin.golasowski@xxxxxx>: > >> Hi, >> does anyone here use CEPH iSCSI with VMware ESXi? It seems that we are >> hitting the 5 second timeout limit on software HBA in ESXi. It appears >> whenever there is increased load on the cluster, like deep scrub or >> rebalance. Is it normal behaviour in production? Or is there something >> special we need to tune? >> >> We are on latest Nautilus, 12 x 10 TB OSDs (4 servers), 25 Gbit/s >> Ethernet, erasure coded rbd pool with 128 PGs, aroun 200 PGs per OSD total. >> >> >> ESXi Log: >> >> 2020-10-04T01:57:04.314Z cpu34:2098959)WARNING: iscsi_vmk: >> iscsivmk_ConnReceiveAtomic:517: vmhba64:CH:1 T:0 CN:0: Failed to receive >> data: Connection closed by peer >> 2020-10-04T01:57:04.314Z cpu34:2098959)iscsi_vmk: >> iscsivmk_ConnRxNotifyFailure:1235: vmhba64:CH:1 T:0 CN:0: Connection rx >> notifying failure: Failed to Receive. State=Bound >> 2020-10-04T01:57:04.566Z cpu19:2098979)WARNING: iscsi_vmk: >> iscsivmk_StopConnection:741: vmhba64:CH:1 T:0 CN:0: iSCSI connection is >> being marked "OFFLINE" (Event:4) >> 2020-10-04T01:57:04.654Z cpu7:2097866)WARNING: VMW_SATP_ALUA: >> satp_alua_issueCommandOnPath:788: Probe cmd 0xa3 failed for path >> "vmhba64:C2:T0:L0" (0x5/0x20/0x0). Check if failover mode is still ALUA. >> >> >> OSD Log: >> >> [303088.450088] Did not receive response to NOPIN on CID: 0, failing >> connection for I_T Nexus >> iqn.1994-05.com.redhat:esxi1,i,0x00023d000002,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,0x01 >> [324926.694077] Did not receive response to NOPIN on CID: 0, failing >> connection for I_T Nexus >> iqn.1994-05.com.redhat:esxi2,i,0x00023d000001,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,0x01 >> [407067.404538] ABORT_TASK: Found referenced iSCSI task_tag: 5891 >> [407076.077175] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: >> 5891 >> [411677.887690] ABORT_TASK: Found referenced iSCSI task_tag: 6722 >> [411683.297425] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: >> 6722 >> [481459.755876] ABORT_TASK: Found referenced iSCSI task_tag: 7930 >> [481460.787968] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: >> 7930 >> >> Cheers, >> Martin_______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx