All SSD cluster performance

Mohammed Naser <mnaser@xxxxxxxxxxxx> · Fri, 13 Jan 2017 12:18:15 -0500

Hi everyone,

We have a deployment with 90 OSDs at the moment which is all SSD that’s not hitting quite the performance that it should be in my opinion, a `rados bench` run gives something along these numbers:

Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_bench.vexxhost._30340
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16       158       142   568.513       568   0.0965336   0.0939971
    2      16       287       271   542.191       516   0.0291494    0.107503
    3      16       375       359    478.75       352   0.0892724    0.118463
    4      16       477       461   461.042       408   0.0243493    0.126649
    5      16       540       524   419.216       252    0.239123    0.132195
    6      16       644       628    418.67       416    0.347606    0.146832
    7      16       734       718   410.281       360   0.0534447    0.147413
    8      16       811       795   397.487       308   0.0311927     0.15004
    9      16       879       863   383.537       272   0.0894534    0.158513
   10      16       980       964   385.578       404   0.0969865    0.162121
   11       3       981       978   355.613        56    0.798949    0.171779
Total time run:         11.063482
Total writes made:      981
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     354.68
Stddev Bandwidth:       137.608
Max bandwidth (MB/sec): 568
Min bandwidth (MB/sec): 56
Average IOPS:           88
Stddev IOPS:            34
Max IOPS:               142
Min IOPS:               14
Average Latency(s):     0.175273
Stddev Latency(s):      0.294736
Max latency(s):         1.97781
Min latency(s):         0.0205769
Cleaning up (deleting benchmark objects)
Clean up completed and total clean up time :3.895293

We’ve verified the network by running `iperf` across both replication and public networks and it resulted in 9.8Gb/s (10G links for both).  The machine that’s running the benchmark doesn’t even saturate it’s port.  The SSDs are S3520 960GB drives which we’ve benchmarked and they can handle the load using fio/etc.  At this point, not really sure where to look next.. anyone running all SSD clusters that might be able to share their experience?

Thanks,
Mohammed
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com