I filled up the cluster by accident by not supplying --no-cleanup to the write benchmark, I'm sure there must be a better way for that though. I've run the tests again and when the cluster is 'empty' (I have a few test VM's stored on CEPH) and let it fill up again. Performance goes up from 276.812 to 433.859 MB/sec and latency goes down from 0.231178 to 0.147433. I do have to mention I did find a problem with the cluster thanks to Alwin's suggestion to (re)do fio benchmarks, one server with identical SSD's is performing poorly compared to the others, I'll resolve this first before continuing other benchmarks. When empty: # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 3784G 2488G 1295G 34.24 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS ssd 1 431G 37.33 723G 110984 rbdbench 76 0 0 723G 0 # rados bench -p rbdbench 180 write -b 4M -t 16 --no-cleanup Total time run: 180.223580 Total writes made: 12472 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 276.812 Stddev Bandwidth: 66.2295 Max bandwidth (MB/sec): 524 Min bandwidth (MB/sec): 112 Average IOPS: 69 Stddev IOPS: 16 Max IOPS: 131 Min IOPS: 28 Average Latency(s): 0.231178 Stddev Latency(s): 0.19153 Max latency(s): 1.16432 Min latency(s): 0.022585 And after a few benchmarks when I hit CEPH's warning near-full.: # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 3784G 751G 3032G 80.13 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS ssd 1 431G 82.93 90858M 110984 rbdbench 76 579G 86.73 90858M 148467 # rados bench -p rbdbench 180 write -b 4M -t 16 --no-cleanup Total time run: 180.233495 Total writes made: 19549 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 433.859 Stddev Bandwidth: 73.0601 Max bandwidth (MB/sec): 584 Min bandwidth (MB/sec): 220 Average IOPS: 108 Stddev IOPS: 18 Max IOPS: 146 Min IOPS: 55 Average Latency(s): 0.147433 Stddev Latency(s): 0.103518 Max latency(s): 1.08162 Min latency(s): 0.0218688 -----Original message----- > From:Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> > Sent: Thursday 6th September 2018 17:15 > To: ceph-users <ceph-users@xxxxxxxxxxxxxx>; Menno Zonneveld <menno@xxxxxxxx> > Subject: RE: Rados performance inconsistencies, lower than expected performance > > > It is idle, testing still, running a backup's at night on it. > How do you fill up the cluster so you can test between empty and full? > Do you have a "ceph df" from empty and full? > > I have done another test disabling new scrubs on the rbd.ssd pool (but > still 3 on hdd) with: > ceph tell osd.* injectargs --osd_max_backfills=0 > Again getting slower towards the end. > Bandwidth (MB/sec): 395.749 > Average Latency(s): 0.161713 > > > -----Original Message----- > From: Menno Zonneveld [mailto:menno@xxxxxxxx] > Sent: donderdag 6 september 2018 16:56 > To: Marc Roos; ceph-users > Subject: RE: Rados performance inconsistencies, lower than > expected performance > > The benchmark does fluctuate quite a bit that's why I run it for 180 > seconds now as then I do get consistent results. > > Your performance seems on par with what I'm getting with 3 nodes and 9 > OSD's, not sure what to make of that. > > Are your machines actively used perhaps? Mine are mostly idle as it's > still a test setup. > > -----Original message----- > > From:Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> > > Sent: Thursday 6th September 2018 16:23 > > To: ceph-users <ceph-users@xxxxxxxxxxxxxx>; Menno Zonneveld > > <menno@xxxxxxxx> > > Subject: RE: Rados performance inconsistencies, lower > > than expected performance > > > > > > > > I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB 2x E5-2660 2x > > > LSI SAS2308 1x dual port 10Gbit (one used, and shared between > > cluster/client vlans) > > > > I have 5 pg's scrubbing, but I am not sure if there is any on the ssd > > pool. I am noticing a drop in the performance at the end of the test. > > Maybe some caching on the ssd? > > > > rados bench -p rbd.ssd 60 write -b 4M -t 16 > > Bandwidth (MB/sec): 448.465 > > Average Latency(s): 0.142671 > > > > rados bench -p rbd.ssd 180 write -b 4M -t 16 > > Bandwidth (MB/sec): 381.998 > > Average Latency(s): 0.167524 > > > > > > -----Original Message----- > > From: Menno Zonneveld [mailto:menno@xxxxxxxx] > > Sent: donderdag 6 september 2018 15:52 > > To: Marc Roos; ceph-users > > Subject: RE: Rados performance inconsistencies, lower > > than expected performance > > > > ah yes, 3x replicated with minimal 2. > > > > > > my ceph.conf is pretty bare, just in case it might be relevant > > > > [global] > > auth client required = cephx > > auth cluster required = cephx > > auth service required = cephx > > > > cluster network = 172.25.42.0/24 > > > > fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e > > > > keyring = /etc/pve/priv/$cluster.$name.keyring > > > > mon allow pool delete = true > > mon osd allow primary affinity = true > > > > osd journal size = 5120 > > osd pool default min size = 2 > > osd pool default size = 3 > > > > > > -----Original message----- > > > From:Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> > > > Sent: Thursday 6th September 2018 15:43 > > > To: ceph-users <ceph-users@xxxxxxxxxxxxxx>; Menno Zonneveld > > > <menno@xxxxxxxx> > > > Subject: RE: Rados performance inconsistencies, lower > > > than expected performance > > > > > > > > > > > > Test pool is 3x replicated? > > > > > > > > > -----Original Message----- > > > From: Menno Zonneveld [mailto:menno@xxxxxxxx] > > > Sent: donderdag 6 september 2018 15:29 > > > To: ceph-users@xxxxxxxxxxxxxx > > > Subject: Rados performance inconsistencies, lower than > > > expected performance > > > > > > I've setup a CEPH cluster to test things before going into > > > production but I've run into some performance issues that I cannot > > > resolve or explain. > > > > > > Hardware in use in each storage machine (x3) > > > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu > > > 9000) > > > - dual 10Gbit EdgeSwitch 16-Port XG > > > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA > > > - 3x Intel S4500 480GB SSD as OSD's > > > - 2x SSD raid-1 boot/OS disks > > > - 2x Intel(R) Xeon(R) CPU E5-2630 > > > - 128GB memory > > > > > > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 > > > > on all nodes. > > > > > > Running rados benchmark resulted in somewhat lower than expected > > > performance unless ceph enters the 'near-full' state. When the > > > cluster > > > > > is mostly empty rados bench (180 write -b 4M -t 16) results in about > > > > 330MB/s with 0.18ms latency but when hitting near-full state this > > > goes > > > > > up to a more expected 550MB/s and 0.11ms latency. > > > > > > iostat on the storage machines shows the disks are hardly utilized > > > unless the cluster hits near-full, CPU and network also aren't maxed > > > > out. I’ve also tried with NIC bonding and just one switch, without > > > jumbo frames but nothing seem to matter in this case. > > > > > > Is this expected behavior or what can I try to do to pinpoint the > > > bottleneck ? > > > > > > The expected performance is per Proxmox's benchmark results they > > > released this year, they have 4 OSD's per server and hit almost > > > 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they > > > have > > > > > more OSD's and somewhat different hardware I understand I won't hit > > > the 800MB/s mark but the difference between empty and almost full > > > cluster makes no sense to me, I'd expect it to be the other way > > around. > > > > > > Thanks, > > > Menno > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > > > > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com