Re: Rados performance inconsistencies, lower than expected performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----Original message-----
> From:Alwin Antreich <a.antreich@xxxxxxxxxxx>
> Sent: Thursday 6th September 2018 18:36
> To: ceph-users <ceph-users@xxxxxxxxxxxxxx>
> Cc: Menno Zonneveld <menno@xxxxxxxx>; Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
> Subject: Re:  Rados performance inconsistencies, lower than expected performance
> 
> On Thu, Sep 06, 2018 at 05:15:26PM +0200, Marc Roos wrote:
> > 
> > It is idle, testing still, running a backup's at night on it.
> > How do you fill up the cluster so you can test between empty and full? 
> > Do you have a "ceph df" from empty and full? 
> > 
> > I have done another test disabling new scrubs on the rbd.ssd pool (but 
> > still 3 on hdd) with:
> > ceph tell osd.* injectargs --osd_max_backfills=0
> > Again getting slower towards the end.
> > Bandwidth (MB/sec):     395.749
> > Average Latency(s):     0.161713
> In the results you both had, the latency is twice as high as in our
> tests [1]. That can already make quiet some difference. Depending on the
> actual hardware used, there may or may not be the possibility for good
> optimisation.
> 
> As a start, you could test the disks with fio, as shown in our benchmark
> paper, to get some results for comparison. The forum thread [1] has
> some benchmarks from other users for comparison.
> 
> [1] https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/

Thanks for the suggestion, I redid the fio test and one server seem to be causing trouble.

When I initially tested our SSD's according to the benchmark paper our Intel SSD's performed more or less equal to the Samsung SSD's used.

from fio.log

fio: (groupid=0, jobs=1): err= 0: pid=3606315: Mon Sep 10 11:12:36 2018
  write: io=4005.9MB, bw=68366KB/s, iops=17091, runt= 60001msec
    slat (usec): min=5, max=252, avg= 5.76, stdev= 0.66
    clat (usec): min=6, max=949, avg=51.72, stdev= 9.54
     lat (usec): min=54, max=955, avg=57.48, stdev= 9.56

However one of the other machines (with identical SSD's) now performs poorly compared to the others with these results

fio: (groupid=0, jobs=1): err= 0: pid=3893600: Mon Sep 10 11:15:17 2018
  write: io=1258.8MB, bw=51801KB/s, iops=12950, runt= 24883msec
    slat (usec): min=5, max=259, avg= 6.17, stdev= 0.78
    clat (usec): min=53, max=857, avg=69.77, stdev=13.11
     lat (usec): min=70, max=863, avg=75.93, stdev=13.17

I'll first resolve the slower machine before doing more testing as this surely won't help overall performance.


> --
> Cheers,
> Alwin

Thanks!,
Menno
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux