Ceph performance, empty vs part full

"MATHIAS, Bryn (Bryn)" <bryn.mathias@xxxxxxxxxxxxxxxxxx> · Wed, 8 Jul 2015 12:25:29 +0000

Hi All,

I’m perf testing a cluster again,
This time I have re-built the cluster and am filling it for testing.

on a 10 min run I get the following results from 5 load generators, each writing though 7 iocontexts, with a queue depth of 50 async writes.

Gen1
Percentile 100 = 0.729775905609
Max latencies = 0.729775905609, Min = 0.0320818424225, mean = 0.0750389684542
Total objects writen = 113088 in time 604.259738207s gives 187.151307376/s (748.605229503 MB/s)

Gen2
Percentile 100 = 0.735981941223
Max latencies = 0.735981941223, Min = 0.0340068340302, mean = 0.0745198070711
Total objects writen = 113822 in time 604.437897921s gives 188.310495407/s (753.241981627 MB/s)

Gen3
Percentile 100 = 0.828994989395
Max latencies = 0.828994989395, Min = 0.0349340438843, mean = 0.0745455575197
Total objects writen = 113670 in time 604.352181911s gives 188.085694736/s (752.342778944 MB/s)

Gen4
Percentile 100 = 1.06834602356
Max latencies = 1.06834602356, Min = 0.0333499908447, mean = 0.0752239764659
Total objects writen = 112744 in time 604.408732891s gives 186.536020849/s (746.144083397 MB/s)

Gen5
Percentile 100 = 0.609658002853
Max latencies = 0.609658002853, Min = 0.032968044281, mean = 0.0744482759499
Total objects writen = 113918 in time 604.671534061s gives 188.396498897/s (753.585995589 MB/s)

example ceph -w output:
2015-07-07 15:50:16.507084 mon.0 [INF] pgmap v1077: 2880 pgs: 2880 active+clean; 1996 GB data, 2515 GB used, 346 TB / 348 TB avail; 2185 MB/s wr, 572 op/s

However when the cluster gets over 20% full I see the following results, this gets worse as the cluster fills up:

Gen1
Percentile 100 = 6.71176099777
Max latencies = 6.71176099777, Min = 0.0358741283417, mean = 0.161760483485
Total objects writen = 52196 in time 604.488474131s gives 86.347386648/s (345.389546592 MB/s)

Gen2
Max latencies = 4.09169006348, Min = 0.0357890129089, mean = 0.163243938477
Total objects writen = 51702 in time 604.036739111s gives 85.5941313704/s (342.376525482 MB/s)

Gen3
Percentile 100 = 7.32526683807
Max latencies = 7.32526683807, Min = 0.0366668701172, mean = 0.163992217926
Total objects writen = 51476 in time 604.684302092s gives 85.1287189397/s (340.514875759 MB/s)

Gen4
Percentile 100 = 7.56094503403
Max latencies = 7.56094503403, Min = 0.0355761051178, mean = 0.162109421231
Total objects writen = 52092 in time 604.769910812s gives 86.1352376642/s (344.540950657 MB/s)

Gen5
Percentile 100 = 6.99595499039
Max latencies = 6.99595499039, Min = 0.0364680290222, mean = 0.163651215426
Total objects writen = 51566 in time 604.061977148s gives 85.3654127404/s (341.461650961 MB/s)

Cluster details:
5*HPDL380’s with 13*6Tb OSD’s
128Gb Ram
2*intel 2620v3
10 Gbit Ceph public network
10 Gbit Ceph private network

Load generators connected via a 20Gbit bond to the ceph public network.

Is this likely to be something happening to the journals?

Or is there something else going on.

I have run FIO and iperf tests and the disk and network performance is very high.

Kind Regards,
Bryn Mathias

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com