Re: Ceph performance, empty vs part full

Mark Nelson <mnelson@xxxxxxxxxx> · Thu, 03 Sep 2015 17:04:04 -0500

Hrm, I think it will follow the merge/split rules if it's out of whack 
given the new settings, but I don't know that I've ever tested it on an 
existing cluster to see that it actually happens.  I guess let it sit 
for a while and then check the OSD PG directories to see if the object 
counts make sense given the new settings? :D

Mark

On 09/03/2015 04:31 PM, Ben Hines wrote:
Hey Mark,

I've just tweaked these filestore settings for my cluster -- after
changing this, is there a way to make ceph move existing objects
around to new filestore locations, or will this only apply to newly
created objects? (i would assume the latter..)

thanks,

-Ben

On Wed, Jul 8, 2015 at 6:39 AM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
Basically for each PG, there's a directory tree where only a certain number
of objects are allowed in a given directory before it splits into new
branches/leaves.  The problem is that this has a fair amount of overhead and
also there's extra associated dentry lookups to get at any given object.

You may want to try something like:

"filestore merge threshold = 40"
"filestore split multiple = 8"

This will dramatically increase the number of objects per directory allowed.

Another thing you may want to try is telling the kernel to greatly favor
retaining dentries and inodes in cache:

echo 1 | sudo tee /proc/sys/vm/vfs_cache_pressure

Mark

On 07/08/2015 08:13 AM, MATHIAS, Bryn (Bryn) wrote:

If I create a new pool it is generally fast for a short amount of time.
Not as fast as if I had a blank cluster, but close to.

Bryn

On 8 Jul 2015, at 13:55, Gregory Farnum <greg@xxxxxxxxxxx> wrote:

I think you're probably running into the internal PG/collection
splitting here; try searching for those terms and seeing what your OSD
folder structures look like. You could test by creating a new pool and
seeing if it's faster or slower than the one you've already filled up.
-Greg

On Wed, Jul 8, 2015 at 1:25 PM, MATHIAS, Bryn (Bryn)
<bryn.mathias@xxxxxxxxxxxxxxxxxx> wrote:

Hi All,

I’m perf testing a cluster again,
This time I have re-built the cluster and am filling it for testing.

on a 10 min run I get the following results from 5 load generators, each
writing though 7 iocontexts, with a queue depth of 50 async writes.

Gen1
Percentile 100 = 0.729775905609
Max latencies = 0.729775905609, Min = 0.0320818424225, mean =
0.0750389684542
Total objects writen = 113088 in time 604.259738207s gives
187.151307376/s (748.605229503 MB/s)

Gen2
Percentile 100 = 0.735981941223
Max latencies = 0.735981941223, Min = 0.0340068340302, mean =
0.0745198070711
Total objects writen = 113822 in time 604.437897921s gives
188.310495407/s (753.241981627 MB/s)

Gen3
Percentile 100 = 0.828994989395
Max latencies = 0.828994989395, Min = 0.0349340438843, mean =
0.0745455575197
Total objects writen = 113670 in time 604.352181911s gives
188.085694736/s (752.342778944 MB/s)

Gen4
Percentile 100 = 1.06834602356
Max latencies = 1.06834602356, Min = 0.0333499908447, mean =
0.0752239764659
Total objects writen = 112744 in time 604.408732891s gives
186.536020849/s (746.144083397 MB/s)

Gen5
Percentile 100 = 0.609658002853
Max latencies = 0.609658002853, Min = 0.032968044281, mean =
0.0744482759499
Total objects writen = 113918 in time 604.671534061s gives
188.396498897/s (753.585995589 MB/s)

example ceph -w output:
2015-07-07 15:50:16.507084 mon.0 [INF] pgmap v1077: 2880 pgs: 2880
active+clean; 1996 GB data, 2515 GB used, 346 TB / 348 TB avail; 2185 MB/s
wr, 572 op/s

However when the cluster gets over 20% full I see the following results,
this gets worse as the cluster fills up:

Gen1
Percentile 100 = 6.71176099777
Max latencies = 6.71176099777, Min = 0.0358741283417, mean =
0.161760483485
Total objects writen = 52196 in time 604.488474131s gives 86.347386648/s
(345.389546592 MB/s)

Gen2
Max latencies = 4.09169006348, Min = 0.0357890129089, mean =
0.163243938477
Total objects writen = 51702 in time 604.036739111s gives
85.5941313704/s (342.376525482 MB/s)

Gen3
Percentile 100 = 7.32526683807
Max latencies = 7.32526683807, Min = 0.0366668701172, mean =
0.163992217926
Total objects writen = 51476 in time 604.684302092s gives
85.1287189397/s (340.514875759 MB/s)

Gen4
Percentile 100 = 7.56094503403
Max latencies = 7.56094503403, Min = 0.0355761051178, mean =
0.162109421231
Total objects writen = 52092 in time 604.769910812s gives
86.1352376642/s (344.540950657 MB/s)

Gen5
Percentile 100 = 6.99595499039
Max latencies = 6.99595499039, Min = 0.0364680290222, mean =
0.163651215426
Total objects writen = 51566 in time 604.061977148s gives
85.3654127404/s (341.461650961 MB/s)

Cluster details:
5*HPDL380’s with 13*6Tb OSD’s
128Gb Ram
2*intel 2620v3
10 Gbit Ceph public network
10 Gbit Ceph private network

Load generators connected via a 20Gbit bond to the ceph public network.

Is this likely to be something happening to the journals?

Or is there something else going on.

I have run FIO and iperf tests and the disk and network performance is
very high.

Kind Regards,
Bryn Mathias

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com