-----Original message----- > From:Alwin Antreich <a.antreich@xxxxxxxxxxx> > Sent: Thursday 6th September 2018 18:36 > To: ceph-users <ceph-users@xxxxxxxxxxxxxx> > Cc: Menno Zonneveld <menno@xxxxxxxx>; Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> > Subject: Re: Rados performance inconsistencies, lower than expected performance > > On Thu, Sep 06, 2018 at 05:15:26PM +0200, Marc Roos wrote: > > > > It is idle, testing still, running a backup's at night on it. > > How do you fill up the cluster so you can test between empty and full? > > Do you have a "ceph df" from empty and full? > > > > I have done another test disabling new scrubs on the rbd.ssd pool (but > > still 3 on hdd) with: > > ceph tell osd.* injectargs --osd_max_backfills=0 > > Again getting slower towards the end. > > Bandwidth (MB/sec): 395.749 > > Average Latency(s): 0.161713 > In the results you both had, the latency is twice as high as in our > tests [1]. That can already make quiet some difference. Depending on the > actual hardware used, there may or may not be the possibility for good > optimisation. > > As a start, you could test the disks with fio, as shown in our benchmark > paper, to get some results for comparison. The forum thread [1] has > some benchmarks from other users for comparison. > > [1] https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/ Thanks for the suggestion, I redid the fio test and one server seem to be causing trouble. When I initially tested our SSD's according to the benchmark paper our Intel SSD's performed more or less equal to the Samsung SSD's used. from fio.log fio: (groupid=0, jobs=1): err= 0: pid=3606315: Mon Sep 10 11:12:36 2018 write: io=4005.9MB, bw=68366KB/s, iops=17091, runt= 60001msec slat (usec): min=5, max=252, avg= 5.76, stdev= 0.66 clat (usec): min=6, max=949, avg=51.72, stdev= 9.54 lat (usec): min=54, max=955, avg=57.48, stdev= 9.56 However one of the other machines (with identical SSD's) now performs poorly compared to the others with these results fio: (groupid=0, jobs=1): err= 0: pid=3893600: Mon Sep 10 11:15:17 2018 write: io=1258.8MB, bw=51801KB/s, iops=12950, runt= 24883msec slat (usec): min=5, max=259, avg= 6.17, stdev= 0.78 clat (usec): min=53, max=857, avg=69.77, stdev=13.11 lat (usec): min=70, max=863, avg=75.93, stdev=13.17 I'll first resolve the slower machine before doing more testing as this surely won't help overall performance. > -- > Cheers, > Alwin Thanks!, Menno _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com