Re: Ceph performance, empty vs part full

Ben Hines <bhines@xxxxxxxxx> · Thu, 3 Sep 2015 14:31:14 -0700

Hey Mark,

I've just tweaked these filestore settings for my cluster -- after
changing this, is there a way to make ceph move existing objects
around to new filestore locations, or will this only apply to newly
created objects? (i would assume the latter..)

thanks,

-Ben

On Wed, Jul 8, 2015 at 6:39 AM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
> Basically for each PG, there's a directory tree where only a certain number
> of objects are allowed in a given directory before it splits into new
> branches/leaves.  The problem is that this has a fair amount of overhead and
> also there's extra associated dentry lookups to get at any given object.
>
> You may want to try something like:
>
> "filestore merge threshold = 40"
> "filestore split multiple = 8"
>
> This will dramatically increase the number of objects per directory allowed.
>
> Another thing you may want to try is telling the kernel to greatly favor
> retaining dentries and inodes in cache:
>
> echo 1 | sudo tee /proc/sys/vm/vfs_cache_pressure
>
> Mark
>
>
> On 07/08/2015 08:13 AM, MATHIAS, Bryn (Bryn) wrote:
>>
>> If I create a new pool it is generally fast for a short amount of time.
>> Not as fast as if I had a blank cluster, but close to.
>>
>> Bryn
>>>
>>> On 8 Jul 2015, at 13:55, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>>
>>> I think you're probably running into the internal PG/collection
>>> splitting here; try searching for those terms and seeing what your OSD
>>> folder structures look like. You could test by creating a new pool and
>>> seeing if it's faster or slower than the one you've already filled up.
>>> -Greg
>>>
>>> On Wed, Jul 8, 2015 at 1:25 PM, MATHIAS, Bryn (Bryn)
>>> <bryn.mathias@xxxxxxxxxxxxxxxxxx> wrote:
>>>>
>>>> Hi All,
>>>>
>>>>
>>>> I’m perf testing a cluster again,
>>>> This time I have re-built the cluster and am filling it for testing.
>>>>
>>>> on a 10 min run I get the following results from 5 load generators, each
>>>> writing though 7 iocontexts, with a queue depth of 50 async writes.
>>>>
>>>>
>>>> Gen1
>>>> Percentile 100 = 0.729775905609
>>>> Max latencies = 0.729775905609, Min = 0.0320818424225, mean =
>>>> 0.0750389684542
>>>> Total objects writen = 113088 in time 604.259738207s gives
>>>> 187.151307376/s (748.605229503 MB/s)
>>>>
>>>> Gen2
>>>> Percentile 100 = 0.735981941223
>>>> Max latencies = 0.735981941223, Min = 0.0340068340302, mean =
>>>> 0.0745198070711
>>>> Total objects writen = 113822 in time 604.437897921s gives
>>>> 188.310495407/s (753.241981627 MB/s)
>>>>
>>>> Gen3
>>>> Percentile 100 = 0.828994989395
>>>> Max latencies = 0.828994989395, Min = 0.0349340438843, mean =
>>>> 0.0745455575197
>>>> Total objects writen = 113670 in time 604.352181911s gives
>>>> 188.085694736/s (752.342778944 MB/s)
>>>>
>>>> Gen4
>>>> Percentile 100 = 1.06834602356
>>>> Max latencies = 1.06834602356, Min = 0.0333499908447, mean =
>>>> 0.0752239764659
>>>> Total objects writen = 112744 in time 604.408732891s gives
>>>> 186.536020849/s (746.144083397 MB/s)
>>>>
>>>> Gen5
>>>> Percentile 100 = 0.609658002853
>>>> Max latencies = 0.609658002853, Min = 0.032968044281, mean =
>>>> 0.0744482759499
>>>> Total objects writen = 113918 in time 604.671534061s gives
>>>> 188.396498897/s (753.585995589 MB/s)
>>>>
>>>> example ceph -w output:
>>>> 2015-07-07 15:50:16.507084 mon.0 [INF] pgmap v1077: 2880 pgs: 2880
>>>> active+clean; 1996 GB data, 2515 GB used, 346 TB / 348 TB avail; 2185 MB/s
>>>> wr, 572 op/s
>>>>
>>>>
>>>> However when the cluster gets over 20% full I see the following results,
>>>> this gets worse as the cluster fills up:
>>>>
>>>> Gen1
>>>> Percentile 100 = 6.71176099777
>>>> Max latencies = 6.71176099777, Min = 0.0358741283417, mean =
>>>> 0.161760483485
>>>> Total objects writen = 52196 in time 604.488474131s gives 86.347386648/s
>>>> (345.389546592 MB/s)
>>>>
>>>> Gen2
>>>> Max latencies = 4.09169006348, Min = 0.0357890129089, mean =
>>>> 0.163243938477
>>>> Total objects writen = 51702 in time 604.036739111s gives
>>>> 85.5941313704/s (342.376525482 MB/s)
>>>>
>>>> Gen3
>>>> Percentile 100 = 7.32526683807
>>>> Max latencies = 7.32526683807, Min = 0.0366668701172, mean =
>>>> 0.163992217926
>>>> Total objects writen = 51476 in time 604.684302092s gives
>>>> 85.1287189397/s (340.514875759 MB/s)
>>>>
>>>> Gen4
>>>> Percentile 100 = 7.56094503403
>>>> Max latencies = 7.56094503403, Min = 0.0355761051178, mean =
>>>> 0.162109421231
>>>> Total objects writen = 52092 in time 604.769910812s gives
>>>> 86.1352376642/s (344.540950657 MB/s)
>>>>
>>>>
>>>> Gen5
>>>> Percentile 100 = 6.99595499039
>>>> Max latencies = 6.99595499039, Min = 0.0364680290222, mean =
>>>> 0.163651215426
>>>> Total objects writen = 51566 in time 604.061977148s gives
>>>> 85.3654127404/s (341.461650961 MB/s)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Cluster details:
>>>> 5*HPDL380’s with 13*6Tb OSD’s
>>>> 128Gb Ram
>>>> 2*intel 2620v3
>>>> 10 Gbit Ceph public network
>>>> 10 Gbit Ceph private network
>>>>
>>>> Load generators connected via a 20Gbit bond to the ceph public network.
>>>>
>>>>
>>>> Is this likely to be something happening to the journals?
>>>>
>>>> Or is there something else going on.
>>>>
>>>> I have run FIO and iperf tests and the disk and network performance is
>>>> very high.
>>>>
>>>>
>>>> Kind Regards,
>>>> Bryn Mathias
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com