ah yes, 3x replicated with minimal 2. my ceph.conf is pretty bare, just in case it might be relevant [global] auth client required = cephx auth cluster required = cephx auth service required = cephx cluster network = 172.25.42.0/24 fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e keyring = /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true mon osd allow primary affinity = true osd journal size = 5120 osd pool default min size = 2 osd pool default size = 3 -----Original message----- > From:Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> > Sent: Thursday 6th September 2018 15:43 > To: ceph-users <ceph-users@xxxxxxxxxxxxxx>; Menno Zonneveld <menno@xxxxxxxx> > Subject: RE: Rados performance inconsistencies, lower than expected performance > > > > Test pool is 3x replicated? > > > -----Original Message----- > From: Menno Zonneveld [mailto:menno@xxxxxxxx] > Sent: donderdag 6 september 2018 15:29 > To: ceph-users@xxxxxxxxxxxxxx > Subject: Rados performance inconsistencies, lower than > expected performance > > I've setup a CEPH cluster to test things before going into production > but I've run into some performance issues that I cannot resolve or > explain. > > Hardware in use in each storage machine (x3) > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000) > - dual 10Gbit EdgeSwitch 16-Port XG > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA > - 3x Intel S4500 480GB SSD as OSD's > - 2x SSD raid-1 boot/OS disks > - 2x Intel(R) Xeon(R) CPU E5-2630 > - 128GB memory > > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on > all nodes. > > Running rados benchmark resulted in somewhat lower than expected > performance unless ceph enters the 'near-full' state. When the cluster > is mostly empty rados bench (180 write -b 4M -t 16) results in about > 330MB/s with 0.18ms latency but when hitting near-full state this goes > up to a more expected 550MB/s and 0.11ms latency. > > iostat on the storage machines shows the disks are hardly utilized > unless the cluster hits near-full, CPU and network also aren't maxed > out. I’ve also tried with NIC bonding and just one switch, without > jumbo frames but nothing seem to matter in this case. > > Is this expected behavior or what can I try to do to pinpoint the > bottleneck ? > > The expected performance is per Proxmox's benchmark results they > released this year, they have 4 OSD's per server and hit almost 800MB/s > with 0.08ms latency using 10Gbit and 3 nodes, though they have more > OSD's and somewhat different hardware I understand I won't hit the > 800MB/s mark but the difference between empty and almost full cluster > makes no sense to me, I'd expect it to be the other way around. > > Thanks, > Menno > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com